211
Prediction in language comprehension: theory and case studies Roger Levy UC San Diego Saarland University 13 May 2013 1 Friday, May 17, 13

Prediction in language comprehension: theory and case studies

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Prediction in language comprehension: theory and case studies

Prediction in language comprehension: theory and case studies

Roger LevyUC San Diego

Saarland University13 May 2013

1Friday, May 17, 13

Page 2: Prediction in language comprehension: theory and case studies

Talk outline• Ambiguity resolution & prediction in sentence processing• Probabilistic grammars and surprisal as a theory unifying

the two• Application to garden-pathing• Application to predictability-based facilitation (without

ambiguity)• Rethinking syntactic complexity with surprisal:

discontinuous constituency• Surprisal as a quantitative theory of processing difficulty• Cloze probability and the relationship between linguistic

experience and prediction

2

2Friday, May 17, 13

Page 3: Prediction in language comprehension: theory and case studies

Theories of sentence comprehension

3

3Friday, May 17, 13

Page 4: Prediction in language comprehension: theory and case studies

Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence

comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized

3

3Friday, May 17, 13

Page 5: Prediction in language comprehension: theory and case studies

Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence

comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized

3

Not all sentences are equally easy to understand, nor are all parts of a given sentence are equally easy to understand

3Friday, May 17, 13

Page 6: Prediction in language comprehension: theory and case studies

Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence

comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized

• Today I will focus on the relationship of the last of these desiderata to the rest

3

Not all sentences are equally easy to understand, nor are all parts of a given sentence are equally easy to understand

3Friday, May 17, 13

Page 7: Prediction in language comprehension: theory and case studies

Ambiguity and syntactic complexity• In sentence processing research, differential difficulty is

often attributed to two major sources:• Ambiguity resolution: a comprehender makes the wrong

bet about a local ambiguity and pays for it later

• Syntactic complexity: some part of an utterance is difficult in the absence of major sources of ambiguity

4

Mary punished the children of the musician who...

This is the malt that the rat that the cat that the dog worried killed ate.

4Friday, May 17, 13

Page 8: Prediction in language comprehension: theory and case studies

Ambiguity and syntactic complexity• In sentence processing research, differential difficulty is

often attributed to two major sources:• Ambiguity resolution: a comprehender makes the wrong

bet about a local ambiguity and pays for it later

• Syntactic complexity: some part of an utterance is difficult in the absence of major sources of ambiguity

4

Mary punished the children of the musician who...were

This is the malt that the rat that the cat that the dog worried killed ate.

4Friday, May 17, 13

Page 9: Prediction in language comprehension: theory and case studies

Incrementality and Rationality• Online sentence comprehension is hard• But lots of information sources can be usefully brought to

bear to help with the task• Therefore, it would be rational for people to use all the

information available, whenever possible• This is what incrementality is• We have lots of evidence that people do this often

“Put the apple on the towel in the box.” (Tanenhaus et al., 1995, Science)

5Friday, May 17, 13

Page 10: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence

6Friday, May 17, 13

Page 11: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

6Friday, May 17, 13

Page 12: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

NP VP

S“Main Verb”

6Friday, May 17, 13

Page 13: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

“Main Verb”

6Friday, May 17, 13

Page 14: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

NP VP

S“Main Verb” “Reduced Relative”

6Friday, May 17, 13

Page 15: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

NP VP

S“Main Verb” “Reduced Relative”

NP VP

S

6Friday, May 17, 13

Page 16: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

NP VP

S“Main Verb” “Reduced Relative”

that was

NP VP

S

6Friday, May 17, 13

Page 17: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

NP VP

S“Main Verb” “Reduced Relative”

NP VP

S

6Friday, May 17, 13

Page 18: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

(The evidence examined by the lawyer was unreliable.)

NP VP

S“Main Verb” “Reduced Relative”

NP VP

S

6Friday, May 17, 13

Page 19: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

NP VP

S“Main Verb” “Reduced Relative”

NP VP

S

6Friday, May 17, 13

Page 20: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

• People fail to understand it most of the time

NP VP

S“Main Verb” “Reduced Relative”

NP VP

S

6Friday, May 17, 13

Page 21: Prediction in language comprehension: theory and case studies

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

• People fail to understand it most of the time• People are likely to misunderstand it—e.g.,

• “What’s a barn fell?”• The horse that raced past the barn fell• The horse raced past the barn and fell

NP VP

S“Main Verb” “Reduced Relative”

NP VP

S

6Friday, May 17, 13

Page 22: Prediction in language comprehension: theory and case studies

• Enter probabilistic grammars from computational linguistics...

7Friday, May 17, 13

Page 23: Prediction in language comprehension: theory and case studies

a man arrived yesterday

8Friday, May 17, 13

Page 24: Prediction in language comprehension: theory and case studies

a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...

8Friday, May 17, 13

Page 25: Prediction in language comprehension: theory and case studies

a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...

8Friday, May 17, 13

Page 26: Prediction in language comprehension: theory and case studies

a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...

8Friday, May 17, 13

Page 27: Prediction in language comprehension: theory and case studies

a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...

0.7

0.150.35

0.40.3 0.03 0.02

0.07

Total probability: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.85×10-7

8Friday, May 17, 13

Page 28: Prediction in language comprehension: theory and case studies

Probabilistic theories of ambiguity resolution

9Friday, May 17, 13

Page 29: Prediction in language comprehension: theory and case studies

Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from

computational linguistics into psycholinguistics

9Friday, May 17, 13

Page 30: Prediction in language comprehension: theory and case studies

Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from

computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental

parses:

9Friday, May 17, 13

Page 31: Prediction in language comprehension: theory and case studies

Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from

computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental

parses:

9Friday, May 17, 13

Page 32: Prediction in language comprehension: theory and case studies

Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from

computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental

parses:

• Jurafsky 1996 estimated the probability ratio of these parses as 82:1

9Friday, May 17, 13

Page 33: Prediction in language comprehension: theory and case studies

Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from

computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental

parses:

• Jurafsky 1996 estimated the probability ratio of these parses as 82:1

• He proposed that the main-verb analysis “falls off the beam”

9Friday, May 17, 13

Page 34: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

10Friday, May 17, 13

Page 35: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13

Page 36: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process

(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13

Page 37: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines!

(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13

Page 38: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13

Page 39: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

(Hale, 2001, NAACL; Levy, 2008, Cognition)

chat?

10Friday, May 17, 13

Page 40: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

(Hale, 2001, NAACL; Levy, 2008, Cognition)

chat? wash?

10Friday, May 17, 13

Page 41: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

(Hale, 2001, NAACL; Levy, 2008, Cognition)

chat? wash? get warm?

10Friday, May 17, 13

Page 42: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

the children went outside to…

(Hale, 2001, NAACL; Levy, 2008, Cognition)

chat? wash? get warm?

10Friday, May 17, 13

Page 43: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

the children went outside to…

(Hale, 2001, NAACL; Levy, 2008, Cognition)

play

chat? wash? get warm?

10Friday, May 17, 13

Page 44: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

the children went outside to…

• Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980)

(Hale, 2001, NAACL; Levy, 2008, Cognition)

play

chat? wash? get warm?

10Friday, May 17, 13

Page 45: Prediction in language comprehension: theory and case studies

Quantifying probabilistic online processing difficulty

• Let a word’s difficulty be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…

the children went outside to…

• Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980)

• Combine with probabilistic grammars to give grammatical expectations (Hale, 2001, NAACL; Levy, 2008, Cognition)

play

chat? wash? get warm?

10Friday, May 17, 13

Page 46: Prediction in language comprehension: theory and case studies

The surprisal graph

0

1.0000

2.0000

3.0000

4.0000

0 0.3 0.5 0.8 1.0

Surp

risal

(-lo

g P)

Probability11Friday, May 17, 13

Page 47: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal

When the dog scratched the vet and his new assistant removed the muzzle.

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 48: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

When the dog scratched the vet and his new assistant removed the muzzle.

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 49: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

When the dog scratched the vet and his new assistant removed the muzzle.

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 50: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

When the dog scratched the vet and his new assistant removed the muzzle.

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 51: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

When the dog scratched the vet and his new assistant removed the muzzle.

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 52: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

When the dog scratched the vet and his new assistant removed the muzzle.

difficulty here(68ms/char)

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 53: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

• Compare with:

When the dog scratched the vet and his new assistant removed the muzzle.

When the dog scratched, the vet and his new assistant removed the muzzle.

When the dog scratched its owner the vet and his new assistant removed the muzzle.

difficulty here(68ms/char)

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 54: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

• Compare with:

When the dog scratched the vet and his new assistant removed the muzzle.

When the dog scratched, the vet and his new assistant removed the muzzle.

When the dog scratched its owner the vet and his new assistant removed the muzzle.

difficulty here(68ms/char)

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 55: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

• Compare with:

When the dog scratched the vet and his new assistant removed the muzzle.

When the dog scratched, the vet and his new assistant removed the muzzle.

When the dog scratched its owner the vet and his new assistant removed the muzzle.

difficulty here(68ms/char)

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 56: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

• Compare with:

When the dog scratched the vet and his new assistant removed the muzzle.

When the dog scratched, the vet and his new assistant removed the muzzle.

When the dog scratched its owner the vet and his new assistant removed the muzzle.

difficulty here(68ms/char)

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 57: Prediction in language comprehension: theory and case studies

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity

• Compare with:

When the dog scratched the vet and his new assistant removed the muzzle.

When the dog scratched, the vet and his new assistant removed the muzzle.

When the dog scratched its owner the vet and his new assistant removed the muzzle.

difficulty here(68ms/char)

easier(50ms/char)

(Frazier & Rayner, 1982)12Friday, May 17, 13

Page 58: Prediction in language comprehension: theory and case studies

A small PCFG for this sentence type

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

(analysis in Levy, 2011)13Friday, May 17, 13

Page 59: Prediction in language comprehension: theory and case studies

A small PCFG for this sentence type

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

(analysis in Levy, 2011)13Friday, May 17, 13

Page 60: Prediction in language comprehension: theory and case studies

Two incremental trees

14

14Friday, May 17, 13

Page 61: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

14

14Friday, May 17, 13

Page 62: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

14

14Friday, May 17, 13

Page 63: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.826

14

14Friday, May 17, 13

Page 64: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.826

14

14Friday, May 17, 13

Page 65: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

P (T |w1...10) = 0.826

14

14Friday, May 17, 13

Page 66: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

P (T |w1...10) = 0.826

14

removed?

14Friday, May 17, 13

Page 67: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

P (T |w1...10) = 0.826

14removed?

removed?

14Friday, May 17, 13

Page 68: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

Disambiguating word probability marginalizes over incremental trees:

P (T |w1...10) = 0.826

14

14Friday, May 17, 13

Page 69: Prediction in language comprehension: theory and case studies

Two incremental trees• “Garden-path” analysis:

• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

Disambiguating word probability marginalizes over incremental trees:

P (removed|w1...10) =∑

T

P (removed|T )P (T |w1...10)

= 0.826× 0 + 0.174× 0.25P (T |w1...10) = 0.826

14

14Friday, May 17, 13

Page 70: Prediction in language comprehension: theory and case studies

Preceding context can disambiguate• “its owner” takes up the object slot of scratched

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

Det

its

N

owner

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

Condition Surprisal at ResolutionNP absent 4.2NP present 2

15Friday, May 17, 13

Page 71: Prediction in language comprehension: theory and case studies

Sensitivity to verb argument structure• A superficially similar example:

When the dog arrived the vet and his new assistant removed the muzzle.

(Staub, 2007)16Friday, May 17, 13

Page 72: Prediction in language comprehension: theory and case studies

Sensitivity to verb argument structure• A superficially similar example:

When the dog arrived the vet and his new assistant removed the muzzle.

Easier here

(Staub, 2007)16Friday, May 17, 13

Page 73: Prediction in language comprehension: theory and case studies

Sensitivity to verb argument structure• A superficially similar example:

When the dog arrived the vet and his new assistant removed the muzzle.

Easier here

(Staub, 2007)

But harder here!

16Friday, May 17, 13

Page 74: Prediction in language comprehension: theory and case studies

Sensitivity to verb argument structure• A superficially similar example:

When the dog arrived the vet and his new assistant removed the muzzle.

(c.f. When the dog scratched the vet and his new assistant removed the muzzle.)

Easier here

(Staub, 2007)

But harder here!

16Friday, May 17, 13

Page 75: Prediction in language comprehension: theory and case studies

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

Modeling argument-structure sensitivity

17Friday, May 17, 13

Page 76: Prediction in language comprehension: theory and case studies

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

Modeling argument-structure sensitivity

17Friday, May 17, 13

Page 77: Prediction in language comprehension: theory and case studies

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

Modeling argument-structure sensitivity

• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:

(Johnson, 1999; Klein & Manning, 2003)17Friday, May 17, 13

Page 78: Prediction in language comprehension: theory and case studies

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

Modeling argument-structure sensitivity

• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:

(Johnson, 1999; Klein & Manning, 2003)17Friday, May 17, 13

Page 79: Prediction in language comprehension: theory and case studies

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

Modeling argument-structure sensitivity

• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:

(Johnson, 1999; Klein & Manning, 2003)

VP → V NP 0.5 VP → Vtrans NP 0.45

VP → V 0.5Replaced by

VP → Vtrans 0.05

V → scratched 0.25 VP → Vintrans 0.45

V → removed 0.25 VP → Vintrans NP 0.05

V → arrived 0.5 Vtrans → scratched 0.5

Vtrans → removed 0.5

Vintrans → arrived 1

17Friday, May 17, 13

Page 80: Prediction in language comprehension: theory and case studies

Result

When the dog arrived the vet and his new assistant removed the muzzle.

When the dog scratched the vet and his new assistant removed the muzzle.

ambiguity onset ambiguity resolution

Transitivity-distinguishing PCFGCondition Ambiguity onset ResolutionIntransitive (arrived) 2.11 3.20Transitive (scratched) 0.44 8.04

18Friday, May 17, 13

Page 81: Prediction in language comprehension: theory and case studies

Move to broad coverage

• Instead of the pedagogical grammar, a “broad-coverage” grammar from the parsed Brown corpus (11,984 rules)

• Relative-frequency estimation of rule probabilities (“vanilla” PCFG)

−60

−40

−20

020

4060

Tran

sitiv

e R

T −

Intra

nsiti

ve R

T

the vet and his new assistant removed the muzzle.

10.

50

−0.5

−1Tr

ansi

tive

surp

risal

− In

trans

itive

sur

pris

al (b

its)

First−pass timeSurprisal

19Friday, May 17, 13

Page 82: Prediction in language comprehension: theory and case studies

Surprisal and syntactic expectations without ambiguity

• Let’s consider the variation in pre-verbal dependency structure found in German

20

Die Einsicht, dass der Freund The insight, that the.NOM friend

dem Kunden das Auto aus Plastik the.DAT client the.ACC car of plastic

verkaufte, erheiterte die Anderen.sold, amused the others.

(Konieczny & Doering, 2003)20Friday, May 17, 13

Page 83: Prediction in language comprehension: theory and case studies

Surprisal and syntactic expectations without ambiguity

• Let’s consider the variation in pre-verbal dependency structure found in German

20

Die Einsicht, dass der Freund The insight, that the.NOM friend

dem Kunden das Auto aus Plastik the.DAT client the.ACC car of plastic

verkaufte, erheiterte die Anderen.sold, amused the others.

(Konieczny & Doering, 2003)20Friday, May 17, 13

Page 84: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 85: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 86: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 87: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 88: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 89: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 90: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 91: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 92: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 93: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?Make it easier because the dependency structure is simpler?

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 94: Prediction in language comprehension: theory and case studies

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?Make it easier because the dependency structure is simpler?No: it makes it harder!

...daß der Freund DES Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend of the client sold a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13

Page 95: Prediction in language comprehension: theory and case studies

22Friday, May 17, 13

Page 96: Prediction in language comprehension: theory and case studies

daß

daß

22Friday, May 17, 13

Page 97: Prediction in language comprehension: theory and case studies

daß

daß

SBAR

COMP

SBAR

COMP

22Friday, May 17, 13

Page 98: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

22Friday, May 17, 13

Page 99: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

22Friday, May 17, 13

Page 100: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

S

NPnom

S

NPnom

22Friday, May 17, 13

Page 101: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

DEM Kunden

DES Kunden

S

NPnom

S

NPnom

22Friday, May 17, 13

Page 102: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

DEM Kunden

DES Kunden

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen

22Friday, May 17, 13

Page 103: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen

22Friday, May 17, 13

Page 104: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen

22Friday, May 17, 13

Page 105: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbverkaufte

verkaufte

daß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen

22Friday, May 17, 13

Page 106: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbverkaufte

verkaufte

V

V

daß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen

22Friday, May 17, 13

Page 107: Prediction in language comprehension: theory and case studies

Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbverkaufte

verkaufte

V

V

daß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen

22Friday, May 17, 13

Page 108: Prediction in language comprehension: theory and case studies

Model results

Reading time (ms)

P(wi): word probability

Locality-based predictions

dem Kunden(dative)

555 8.38×10-8 slower

des Kunden(genitive)

793 6.35×10-8 faster

~30% greater expectation in dative condition

once again, wrong monotonicity

23Friday, May 17, 13

Page 109: Prediction in language comprehension: theory and case studies

Case study: discontinuous dependencies• Most word-word dependencies in most sentences of most

languages are projective• Formally, a set of word-word dependencies is projective iff

they do not cross

• However, sometimes dependencies are non-projective or discontinuous--that is, they cross

24(Levy, Fedorenko, Breen, and Gibson, 2012)

24Friday, May 17, 13

Page 110: Prediction in language comprehension: theory and case studies

Rethinking locality: RC extraposition

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 111: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

Rethinking locality: RC extraposition

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 112: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 113: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 114: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

easy

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 115: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

easy

hard

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 116: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

easy

hard

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 117: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

easy

hard

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 118: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

Rethinking locality: RC extraposition

easy

hard

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 119: Prediction in language comprehension: theory and case studies

• Equipped with surprisal, let’s consider the case of discontinuous dependencies

• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition

• Is this evidence for a special type of locality: a phrasal adjacency constraint (or a constraint against crossing dependencies)?

Rethinking locality: RC extraposition

easy

hard

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

Page 120: Prediction in language comprehension: theory and case studies

Probability & extraposition

26Friday, May 17, 13

Page 121: Prediction in language comprehension: theory and case studies

Probability & extraposition• ButR C extraposition is relatively rare in English

26Friday, May 17, 13

Page 122: Prediction in language comprehension: theory and case studies

Probability & extraposition• ButR C extraposition is relatively rare in English

26Friday, May 17, 13

Page 123: Prediction in language comprehension: theory and case studies

Probability & extraposition• ButR C extraposition is relatively rare in English

In situ: PVP(RC|NP)=0.06 Extraposed: PVP(RC|NP,PP)=0.003(estimated from the parsed Brown corpus)

26Friday, May 17, 13

Page 124: Prediction in language comprehension: theory and case studies

Probability & extraposition• ButR C extraposition is relatively rare in English

In situ: PVP(RC|NP)=0.06 Extraposed: PVP(RC|NP,PP)=0.003(estimated from the parsed Brown corpus)

• Alternative hypothesis: processing extraposed RCs is hard because they’re unexpected

26Friday, May 17, 13

Page 125: Prediction in language comprehension: theory and case studies

Testing the role of expectations

27Friday, May 17, 13

Page 126: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…

27Friday, May 17, 13

Page 127: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier

27Friday, May 17, 13

Page 128: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &

Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation

27Friday, May 17, 13

Page 129: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &

Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation

• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*

27Friday, May 17, 13

Page 130: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &

Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation

• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*

27Friday, May 17, 13

Page 131: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &

Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation

• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*

RC less expected

27Friday, May 17, 13

Page 132: Prediction in language comprehension: theory and case studies

Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &

Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation

• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*

RC less expectedRC more expected

27Friday, May 17, 13

Page 133: Prediction in language comprehension: theory and case studies

Experimental design

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 134: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 135: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 136: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 137: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 138: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 139: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 140: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 141: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 142: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 143: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

• Our prediction is an interactive effect: high RC expectation (“only those”) will facilitate RC reading, but only in the extraposed condition

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 144: Prediction in language comprehension: theory and case studies

Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)• Example sentence: The chairman consulted…

• Our prediction is an interactive effect: high RC expectation (“only those”) will facilitate RC reading, but only in the extraposed condition

• We tested this in a self-paced reading study

Levy, Fedorenko, Breen, & Gibson (2012)

28Friday, May 17, 13

Page 145: Prediction in language comprehension: theory and case studies

Experimental results

29Friday, May 17, 13

Page 146: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!

29Friday, May 17, 13

Page 147: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

29Friday, May 17, 13

Page 148: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

penalty

29Friday, May 17, 13

Page 149: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

• When more expected, it’s not

penalty

29Friday, May 17, 13

Page 150: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

• When more expected, it’s not

penalty

no penalty

29Friday, May 17, 13

Page 151: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

• When more expected, it’s not

• Alternatively, we can think of expectation as facilitating processing for extraposed variant

penalty

no penalty

29Friday, May 17, 13

Page 152: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

• When more expected, it’s not

• Alternatively, we can think of expectation as facilitating processing for extraposed variant

penalty

no penaltyfacilitation

29Friday, May 17, 13

Page 153: Prediction in language comprehension: theory and case studies

Experimental results

• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder

• When more expected, it’s not

• Alternatively, we can think of expectation as facilitating processing for extraposed variant

penalty

no penaltyfacilitation

Interaction p’s ≤ 0.025

29Friday, May 17, 13

Page 154: Prediction in language comprehension: theory and case studies

Experiment: Discussion• Increasing the expectation for an RC facilitates the

processing of extraposed RCs• True even though the extraposed RC is outside of the

continuous-constituent NP domain • The first real evidence that syntactic prediction is

extended beyond the domain of continuous constituents

• Why are (some kinds of) discontinuous constituents hard?• One possibility: locality & phrasal adjacency constraints• New possibility: driven by probabilistic expectations

30Friday, May 17, 13

Page 155: Prediction in language comprehension: theory and case studies

Surprisal vs. predictability in general

• But is there evidence for surprisal as the specific function relating probability to processing difficulty?

31(Smith & Levy, in press)31Friday, May 17, 13

Page 156: Prediction in language comprehension: theory and case studies

Surprisal vs. predictability in general

• But is there evidence for surprisal as the specific function relating probability to processing difficulty?

31(Smith & Levy, in press)31Friday, May 17, 13

Page 157: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to…

(Smith & Levy, in press)32Friday, May 17, 13

Page 158: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to…

(Smith & Levy, in press)32Friday, May 17, 13

Page 159: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)32Friday, May 17, 13

Page 160: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)32Friday, May 17, 13

Page 161: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)

...play

(90% of the time)

32Friday, May 17, 13

Page 162: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)

...play

(90% of the time)

32Friday, May 17, 13

Page 163: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)

...play

(90% of the time)

32Friday, May 17, 13

Page 164: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)

...play

(90% of the time)

...play

(10% of the time)

32Friday, May 17, 13

Page 165: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32

the children went outside to… ...play?

(Smith & Levy, in press)

...play

(90% of the time)

...play

(10% of the time)

32Friday, May 17, 13

Page 166: Prediction in language comprehension: theory and case studies

Proposed probability-time relationships• Logarithmic? • Theory 1: Optimal perceptual discrimination (“what is this

word?”; Stone, 1960; Laming, 1968; Norris, 2006)

33

# time steps elapsed

Post

erio

r pro

babi

lity

of c

orre

ct w

ord

0.0

0.2

0.4

0.6

0.8

1.0 Decision Threshold

1 bit

3 bits

5 bits7 bits

0 200 400 600 800 1000 1200

33Friday, May 17, 13

Page 167: Prediction in language comprehension: theory and case studies

Theories of word-time relationship• Logarithmic?

• Theory 1: Optimal discrimination: “what is this word?” (Stone, 1960; Laming, 1968; Norris, 2006)

# time steps elapsed

Post

erio

r pro

babi

lity

of c

orre

ct w

ord

0.0

0.2

0.4

0.6

0.8

1.0 Decision Threshold

1 bit

3 bits

5 bits7 bits

0 200 400 600 800 1000 1200

Word surprisals

34Friday, May 17, 13

Page 168: Prediction in language comprehension: theory and case studies

Theories of word-time relationship• Logarithmic?• Theory 2: highly incremental processing (Smith & Levy, in

press)

35

35Friday, May 17, 13

Page 169: Prediction in language comprehension: theory and case studies

Theories of word-time relationship• Logarithmic?• Theory 2: highly incremental processing (Smith & Levy, in

press)

36

36Friday, May 17, 13

Page 170: Prediction in language comprehension: theory and case studies

Other proposed probability-time rel’nships

37

37Friday, May 17, 13

Page 171: Prediction in language comprehension: theory and case studies

Other proposed probability-time rel’nships

37

37Friday, May 17, 13

Page 172: Prediction in language comprehension: theory and case studies

Other proposed probability-time rel’nships

37

37Friday, May 17, 13

Page 173: Prediction in language comprehension: theory and case studies

Estimating probability/time curve shape

38Friday, May 17, 13

Page 174: Prediction in language comprehension: theory and case studies

Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two

different methods: self-paced reading & eye-tracking

38Friday, May 17, 13

Page 175: Prediction in language comprehension: theory and case studies

Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two

different methods: self-paced reading & eye-tracking• Challenge: we need big data to estimate curve shape, but

probability correlated with confounding variables

38Friday, May 17, 13

Page 176: Prediction in language comprehension: theory and case studies

Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two

different methods: self-paced reading & eye-tracking• Challenge: we need big data to estimate curve shape, but

probability correlated with confounding variables

(5K words) (50K words)

38Friday, May 17, 13

Page 177: Prediction in language comprehension: theory and case studies

Estimating probability/time curve shape• GAM regression:

total contribution of word (trigram) probability to RT near-linear over 6 orders of magnitude!

(Smith & Levy, in press)at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

Reading times in self-paced reading

Gaze durations in eye-tracking

39Friday, May 17, 13

Page 178: Prediction in language comprehension: theory and case studies

Implications for different theories

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 179: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 180: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 181: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 182: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 183: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic

theory of UID• But UID could still be rescued by an

“optimal alignment with the speaker” view

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 184: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic

theory of UID• But UID could still be rescued by an

“optimal alignment with the speaker” view

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 185: Prediction in language comprehension: theory and case studies

Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic

theory of UID• But UID could still be rescued by an

“optimal alignment with the speaker” view

• Good for theories based on:• Optimal perceptual discrimination• Highly incremental processing

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

40Friday, May 17, 13

Page 186: Prediction in language comprehension: theory and case studies

Implications for predictability norming

41

41Friday, May 17, 13

Page 187: Prediction in language comprehension: theory and case studies

Implications for higher-level processing• I discussed this test of surprisal in terms of lexical

predictability• But let’s revisit syntactic expectations

• We argued that extraposed who was difficult because of syntactic expectations

• Lexically, who is “unpredictable” in both cases, but extraposed who is many bits more surprising

42

42Friday, May 17, 13

Page 188: Prediction in language comprehension: theory and case studies

Implications for higher-level processing• I discussed this test of surprisal in terms of lexical

predictability• But let’s revisit syntactic expectations

• We argued that extraposed who was difficult because of syntactic expectations

• Lexically, who is “unpredictable” in both cases, but extraposed who is many bits more surprising

42

42Friday, May 17, 13

Page 189: Prediction in language comprehension: theory and case studies

Final case study: Cloze and linguistic experience

43

the children went outside to…

(Smith & Levy, 2011)43Friday, May 17, 13

Page 190: Prediction in language comprehension: theory and case studies

Final case study: Cloze and linguistic experience

43

the children went outside to… play

(Smith & Levy, 2011)43Friday, May 17, 13

Page 191: Prediction in language comprehension: theory and case studies

Final case study: Cloze and linguistic experience

43

the children went outside to… playeat

(Smith & Levy, 2011)43Friday, May 17, 13

Page 192: Prediction in language comprehension: theory and case studies

Final case study: Cloze and linguistic experience

43

the children went outside to… playeat

play

(Smith & Levy, 2011)43Friday, May 17, 13

Page 193: Prediction in language comprehension: theory and case studies

Final case study: Cloze and linguistic experience

43

the children went outside to…

play

playeat

play

(Smith & Levy, 2011)43Friday, May 17, 13

Page 194: Prediction in language comprehension: theory and case studies

Final case study: Cloze and linguistic experience

43

the children went outside to…

play

playeat

play

(Smith & Levy, 2011)43Friday, May 17, 13

Page 195: Prediction in language comprehension: theory and case studies

Cloze and linguistic experience

44

44Friday, May 17, 13

Page 196: Prediction in language comprehension: theory and case studies

Cloze and linguistic experience

44

44Friday, May 17, 13

Page 197: Prediction in language comprehension: theory and case studies

Cloze and linguistic experience• To understand the relationship among these, we want to

compare “ground truth” corpus probabilities to Cloze continuations

44

44Friday, May 17, 13

Page 198: Prediction in language comprehension: theory and case studies

Cloze and linguistic experience• To understand the relationship among these, we want to

compare “ground truth” corpus probabilities to Cloze continuations

• Google Web n-grams

44

44Friday, May 17, 13

Page 199: Prediction in language comprehension: theory and case studies

Cloze and linguistic experience• To understand the relationship among these, we want to

compare “ground truth” corpus probabilities to Cloze continuations

• Google Web n-grams• Google Books n-grams

44

44Friday, May 17, 13

Page 200: Prediction in language comprehension: theory and case studies

Example contexts & method• Collect lots of completions to different contexts

• Fit a multivariate model to predict the completions

45

In the winter and ______It was no great ______He played a key ______The time needed to ______

45Friday, May 17, 13

Page 201: Prediction in language comprehension: theory and case studies

Results

46

46Friday, May 17, 13

Page 202: Prediction in language comprehension: theory and case studies

Results

47

47Friday, May 17, 13

Page 203: Prediction in language comprehension: theory and case studies

What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48

48Friday, May 17, 13

Page 204: Prediction in language comprehension: theory and case studies

What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48

48Friday, May 17, 13

Page 205: Prediction in language comprehension: theory and case studies

What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48

48Friday, May 17, 13

Page 206: Prediction in language comprehension: theory and case studies

What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48

48Friday, May 17, 13

Page 207: Prediction in language comprehension: theory and case studies

What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48

48Friday, May 17, 13

Page 208: Prediction in language comprehension: theory and case studies

What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48

48Friday, May 17, 13

Page 209: Prediction in language comprehension: theory and case studies

General summary• Probabilistic grammars and surprisal theory unify

ambiguity resolution and prediction• Prediction takes into account rich syntactic & semantic

contexts• Striking quantitative support for surprisal as the right

index of incremental processing difficulty• Surprisal unifies grammatical expectations and lexical

predictability• The relationship between measured estimates of

linguistic experience and human prediction is non-trivial

49

49Friday, May 17, 13

Page 210: Prediction in language comprehension: theory and case studies

Acknowledgments• Collaborators:

• Nathaniel Smith• Evelina Fedorenko• Mara Breen• Ted Gibson

• Funding• National Science Foundation• National Institutes of Health (NICHD)• Alfred P. Sloan Foundation

• UCSD Computational Psycholinguistics Lab

50

50Friday, May 17, 13

Page 211: Prediction in language comprehension: theory and case studies

Thank you!

http://idiom.ucsd.edu/~rlevy

http://grammar.ucsd.edu/cpl51Friday, May 17, 13