Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Prediction in language comprehension: theory and case studies
Roger LevyUC San Diego
Saarland University13 May 2013
1Friday, May 17, 13
Talk outline• Ambiguity resolution & prediction in sentence processing• Probabilistic grammars and surprisal as a theory unifying
the two• Application to garden-pathing• Application to predictability-based facilitation (without
ambiguity)• Rethinking syntactic complexity with surprisal:
discontinuous constituency• Surprisal as a quantitative theory of processing difficulty• Cloze probability and the relationship between linguistic
experience and prediction
2
2Friday, May 17, 13
Theories of sentence comprehension
3
3Friday, May 17, 13
Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence
comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized
3
3Friday, May 17, 13
Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence
comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized
3
Not all sentences are equally easy to understand, nor are all parts of a given sentence are equally easy to understand
3Friday, May 17, 13
Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence
comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized
• Today I will focus on the relationship of the last of these desiderata to the rest
3
Not all sentences are equally easy to understand, nor are all parts of a given sentence are equally easy to understand
3Friday, May 17, 13
Ambiguity and syntactic complexity• In sentence processing research, differential difficulty is
often attributed to two major sources:• Ambiguity resolution: a comprehender makes the wrong
bet about a local ambiguity and pays for it later
• Syntactic complexity: some part of an utterance is difficult in the absence of major sources of ambiguity
4
Mary punished the children of the musician who...
This is the malt that the rat that the cat that the dog worried killed ate.
4Friday, May 17, 13
Ambiguity and syntactic complexity• In sentence processing research, differential difficulty is
often attributed to two major sources:• Ambiguity resolution: a comprehender makes the wrong
bet about a local ambiguity and pays for it later
• Syntactic complexity: some part of an utterance is difficult in the absence of major sources of ambiguity
4
Mary punished the children of the musician who...were
This is the malt that the rat that the cat that the dog worried killed ate.
4Friday, May 17, 13
Incrementality and Rationality• Online sentence comprehension is hard• But lots of information sources can be usefully brought to
bear to help with the task• Therefore, it would be rational for people to use all the
information available, whenever possible• This is what incrementality is• We have lots of evidence that people do this often
“Put the apple on the towel in the box.” (Tanenhaus et al., 1995, Science)
5Friday, May 17, 13
Anatomy of ye olde garden path sentence
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
NP VP
S“Main Verb”
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
“Main Verb”
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
NP VP
S“Main Verb” “Reduced Relative”
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
NP VP
S“Main Verb” “Reduced Relative”
NP VP
S
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
NP VP
S“Main Verb” “Reduced Relative”
that was
NP VP
S
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
NP VP
S“Main Verb” “Reduced Relative”
NP VP
S
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
(The evidence examined by the lawyer was unreliable.)
NP VP
S“Main Verb” “Reduced Relative”
NP VP
S
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
NP VP
S“Main Verb” “Reduced Relative”
NP VP
S
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
• People fail to understand it most of the time
NP VP
S“Main Verb” “Reduced Relative”
NP VP
S
6Friday, May 17, 13
Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension
The horse raced past the barn fell.
• People fail to understand it most of the time• People are likely to misunderstand it—e.g.,
• “What’s a barn fell?”• The horse that raced past the barn fell• The horse raced past the barn and fell
NP VP
S“Main Verb” “Reduced Relative”
NP VP
S
6Friday, May 17, 13
• Enter probabilistic grammars from computational linguistics...
7Friday, May 17, 13
a man arrived yesterday
8Friday, May 17, 13
a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...
8Friday, May 17, 13
a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...
8Friday, May 17, 13
a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...
8Friday, May 17, 13
a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...
0.7
0.150.35
0.40.3 0.03 0.02
0.07
Total probability: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.85×10-7
8Friday, May 17, 13
Probabilistic theories of ambiguity resolution
9Friday, May 17, 13
Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from
computational linguistics into psycholinguistics
9Friday, May 17, 13
Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from
computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental
parses:
9Friday, May 17, 13
Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from
computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental
parses:
9Friday, May 17, 13
Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from
computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental
parses:
• Jurafsky 1996 estimated the probability ratio of these parses as 82:1
9Friday, May 17, 13
Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from
computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental
parses:
• Jurafsky 1996 estimated the probability ratio of these parses as 82:1
• He proposed that the main-verb analysis “falls off the beam”
9Friday, May 17, 13
Quantifying probabilistic online processing difficulty
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process
(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines!
(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
(Hale, 2001, NAACL; Levy, 2008, Cognition)
chat?
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
(Hale, 2001, NAACL; Levy, 2008, Cognition)
chat? wash?
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
(Hale, 2001, NAACL; Levy, 2008, Cognition)
chat? wash? get warm?
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
the children went outside to…
(Hale, 2001, NAACL; Levy, 2008, Cognition)
chat? wash? get warm?
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
the children went outside to…
(Hale, 2001, NAACL; Levy, 2008, Cognition)
play
chat? wash? get warm?
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
the children went outside to…
• Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980)
(Hale, 2001, NAACL; Levy, 2008, Cognition)
play
chat? wash? get warm?
10Friday, May 17, 13
Quantifying probabilistic online processing difficulty
• Let a word’s difficulty be its surprisal given its context:
• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…
the children went outside to…
• Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980)
• Combine with probabilistic grammars to give grammatical expectations (Hale, 2001, NAACL; Levy, 2008, Cognition)
play
chat? wash? get warm?
10Friday, May 17, 13
The surprisal graph
0
1.0000
2.0000
3.0000
4.0000
0 0.3 0.5 0.8 1.0
Surp
risal
(-lo
g P)
Probability11Friday, May 17, 13
Garden-pathing and surprisal
When the dog scratched the vet and his new assistant removed the muzzle.
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
When the dog scratched the vet and his new assistant removed the muzzle.
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
When the dog scratched the vet and his new assistant removed the muzzle.
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
When the dog scratched the vet and his new assistant removed the muzzle.
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
When the dog scratched the vet and his new assistant removed the muzzle.
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
When the dog scratched the vet and his new assistant removed the muzzle.
difficulty here(68ms/char)
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
• Compare with:
When the dog scratched the vet and his new assistant removed the muzzle.
When the dog scratched, the vet and his new assistant removed the muzzle.
When the dog scratched its owner the vet and his new assistant removed the muzzle.
difficulty here(68ms/char)
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
• Compare with:
When the dog scratched the vet and his new assistant removed the muzzle.
When the dog scratched, the vet and his new assistant removed the muzzle.
When the dog scratched its owner the vet and his new assistant removed the muzzle.
difficulty here(68ms/char)
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
• Compare with:
When the dog scratched the vet and his new assistant removed the muzzle.
When the dog scratched, the vet and his new assistant removed the muzzle.
When the dog scratched its owner the vet and his new assistant removed the muzzle.
difficulty here(68ms/char)
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
• Compare with:
When the dog scratched the vet and his new assistant removed the muzzle.
When the dog scratched, the vet and his new assistant removed the muzzle.
When the dog scratched its owner the vet and his new assistant removed the muzzle.
difficulty here(68ms/char)
(Frazier & Rayner, 1982)12Friday, May 17, 13
Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity
• Compare with:
When the dog scratched the vet and his new assistant removed the muzzle.
When the dog scratched, the vet and his new assistant removed the muzzle.
When the dog scratched its owner the vet and his new assistant removed the muzzle.
difficulty here(68ms/char)
easier(50ms/char)
(Frazier & Rayner, 1982)12Friday, May 17, 13
A small PCFG for this sentence type
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
(analysis in Levy, 2011)13Friday, May 17, 13
A small PCFG for this sentence type
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
(analysis in Levy, 2011)13Friday, May 17, 13
Two incremental trees
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.826
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
• Ultimately-correct analysisS
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.826
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
• Ultimately-correct analysisS
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.174
P (T |w1...10) = 0.826
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
• Ultimately-correct analysisS
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.174
P (T |w1...10) = 0.826
14
removed?
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
• Ultimately-correct analysisS
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.174
P (T |w1...10) = 0.826
14removed?
removed?
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
• Ultimately-correct analysisS
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.174
Disambiguating word probability marginalizes over incremental trees:
P (T |w1...10) = 0.826
14
14Friday, May 17, 13
Two incremental trees• “Garden-path” analysis:
• Ultimately-correct analysisS
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
S
NP VP
V
P (T |w1...10) = 0.174
Disambiguating word probability marginalizes over incremental trees:
P (removed|w1...10) =∑
T
P (removed|T )P (T |w1...10)
= 0.826× 0 + 0.174× 0.25P (T |w1...10) = 0.826
14
14Friday, May 17, 13
Preceding context can disambiguate• “its owner” takes up the object slot of scratched
S
SBAR
COMPL
When
S
NP
Det
the
N
dog
VP
V
scratched
NP
Det
its
N
owner
S
NP
NP
Det
the
N
vet
Conj
and
NP
Det
his
Adj
new
N
assistant
VP
V
Condition Surprisal at ResolutionNP absent 4.2NP present 2
15Friday, May 17, 13
Sensitivity to verb argument structure• A superficially similar example:
When the dog arrived the vet and his new assistant removed the muzzle.
(Staub, 2007)16Friday, May 17, 13
Sensitivity to verb argument structure• A superficially similar example:
When the dog arrived the vet and his new assistant removed the muzzle.
Easier here
(Staub, 2007)16Friday, May 17, 13
Sensitivity to verb argument structure• A superficially similar example:
When the dog arrived the vet and his new assistant removed the muzzle.
Easier here
(Staub, 2007)
But harder here!
16Friday, May 17, 13
Sensitivity to verb argument structure• A superficially similar example:
When the dog arrived the vet and his new assistant removed the muzzle.
(c.f. When the dog scratched the vet and his new assistant removed the muzzle.)
Easier here
(Staub, 2007)
But harder here!
16Friday, May 17, 13
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
Modeling argument-structure sensitivity
17Friday, May 17, 13
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
Modeling argument-structure sensitivity
17Friday, May 17, 13
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
Modeling argument-structure sensitivity
• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:
(Johnson, 1999; Klein & Manning, 2003)17Friday, May 17, 13
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
Modeling argument-structure sensitivity
• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:
(Johnson, 1999; Klein & Manning, 2003)17Friday, May 17, 13
S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2
N → owner 0.2
Modeling argument-structure sensitivity
• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:
(Johnson, 1999; Klein & Manning, 2003)
VP → V NP 0.5 VP → Vtrans NP 0.45
VP → V 0.5Replaced by
⇒
VP → Vtrans 0.05
V → scratched 0.25 VP → Vintrans 0.45
V → removed 0.25 VP → Vintrans NP 0.05
V → arrived 0.5 Vtrans → scratched 0.5
Vtrans → removed 0.5
Vintrans → arrived 1
17Friday, May 17, 13
Result
When the dog arrived the vet and his new assistant removed the muzzle.
When the dog scratched the vet and his new assistant removed the muzzle.
ambiguity onset ambiguity resolution
Transitivity-distinguishing PCFGCondition Ambiguity onset ResolutionIntransitive (arrived) 2.11 3.20Transitive (scratched) 0.44 8.04
18Friday, May 17, 13
Move to broad coverage
• Instead of the pedagogical grammar, a “broad-coverage” grammar from the parsed Brown corpus (11,984 rules)
• Relative-frequency estimation of rule probabilities (“vanilla” PCFG)
−60
−40
−20
020
4060
Tran
sitiv
e R
T −
Intra
nsiti
ve R
T
the vet and his new assistant removed the muzzle.
10.
50
−0.5
−1Tr
ansi
tive
surp
risal
− In
trans
itive
sur
pris
al (b
its)
First−pass timeSurprisal
19Friday, May 17, 13
Surprisal and syntactic expectations without ambiguity
• Let’s consider the variation in pre-verbal dependency structure found in German
20
Die Einsicht, dass der Freund The insight, that the.NOM friend
dem Kunden das Auto aus Plastik the.DAT client the.ACC car of plastic
verkaufte, erheiterte die Anderen.sold, amused the others.
(Konieczny & Doering, 2003)20Friday, May 17, 13
Surprisal and syntactic expectations without ambiguity
• Let’s consider the variation in pre-verbal dependency structure found in German
20
Die Einsicht, dass der Freund The insight, that the.NOM friend
dem Kunden das Auto aus Plastik the.DAT client the.ACC car of plastic
verkaufte, erheiterte die Anderen.sold, amused the others.
(Konieczny & Doering, 2003)20Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?Make it easier because the dependency structure is simpler?
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
What happens in German final-verb processing?
...daß der Freund DEM Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend sold the client a car...’
What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?Make it easier because the dependency structure is simpler?No: it makes it harder!
...daß der Freund DES Kunden das Auto verkaufte
...that the friend the client the car sold
‘...that the friend of the client sold a car...’
(Konieczny & Döring 2003)21Friday, May 17, 13
22Friday, May 17, 13
daß
daß
22Friday, May 17, 13
daß
daß
SBAR
COMP
SBAR
COMP
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
S
NPnom
S
NPnom
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
DEM Kunden
DES Kunden
S
NPnom
S
NPnom
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
DEM Kunden
DES Kunden
S
NPnom
S
NPnom
VP
NPdat
NPnom
NPgen
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
das Auto
das Auto
DEM Kunden
DES Kunden
S
NPnom
S
NPnom
VP
NPdat
NPnom
NPgen
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbdaß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
das Auto
das Auto
DEM Kunden
DES Kunden
NPacc
NPacc
VP
S
NPnom
S
NPnom
VP
NPdat
NPnom
NPgen
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbverkaufte
verkaufte
daß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
das Auto
das Auto
DEM Kunden
DES Kunden
NPacc
NPacc
VP
S
NPnom
S
NPnom
VP
NPdat
NPnom
NPgen
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbverkaufte
verkaufte
V
V
daß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
das Auto
das Auto
DEM Kunden
DES Kunden
NPacc
NPacc
VP
S
NPnom
S
NPnom
VP
NPdat
NPnom
NPgen
22Friday, May 17, 13
Next:NPnom
NPacc
NPdat
PPADVPVerb
Next:NPnom
NPacc
NPdat
PPADVPVerbverkaufte
verkaufte
V
V
daß
daß
SBAR
COMP
SBAR
COMP
der Freund
der Freund
das Auto
das Auto
DEM Kunden
DES Kunden
NPacc
NPacc
VP
S
NPnom
S
NPnom
VP
NPdat
NPnom
NPgen
22Friday, May 17, 13
Model results
Reading time (ms)
P(wi): word probability
Locality-based predictions
dem Kunden(dative)
555 8.38×10-8 slower
des Kunden(genitive)
793 6.35×10-8 faster
~30% greater expectation in dative condition
once again, wrong monotonicity
23Friday, May 17, 13
Case study: discontinuous dependencies• Most word-word dependencies in most sentences of most
languages are projective• Formally, a set of word-word dependencies is projective iff
they do not cross
• However, sometimes dependencies are non-projective or discontinuous--that is, they cross
24(Levy, Fedorenko, Breen, and Gibson, 2012)
24Friday, May 17, 13
Rethinking locality: RC extraposition
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
Rethinking locality: RC extraposition
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
easy
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
easy
hard
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
easy
hard
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
easy
hard
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
Rethinking locality: RC extraposition
easy
hard
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
• Equipped with surprisal, let’s consider the case of discontinuous dependencies
• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition
• Is this evidence for a special type of locality: a phrasal adjacency constraint (or a constraint against crossing dependencies)?
Rethinking locality: RC extraposition
easy
hard
(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13
Probability & extraposition
26Friday, May 17, 13
Probability & extraposition• ButR C extraposition is relatively rare in English
26Friday, May 17, 13
Probability & extraposition• ButR C extraposition is relatively rare in English
26Friday, May 17, 13
Probability & extraposition• ButR C extraposition is relatively rare in English
In situ: PVP(RC|NP)=0.06 Extraposed: PVP(RC|NP,PP)=0.003(estimated from the parsed Brown corpus)
26Friday, May 17, 13
Probability & extraposition• ButR C extraposition is relatively rare in English
In situ: PVP(RC|NP)=0.06 Extraposed: PVP(RC|NP,PP)=0.003(estimated from the parsed Brown corpus)
• Alternative hypothesis: processing extraposed RCs is hard because they’re unexpected
26Friday, May 17, 13
Testing the role of expectations
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &
Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &
Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation
• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &
Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation
• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &
Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation
• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*
RC less expected
27Friday, May 17, 13
Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &
Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation
• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*
RC less expectedRC more expected
27Friday, May 17, 13
Experimental design
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
• Our prediction is an interactive effect: high RC expectation (“only those”) will facilitate RC reading, but only in the extraposed condition
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental design• We crossed RC expectation (low/high) with RC extraposition
(extraposed/unextraposed)• Example sentence: The chairman consulted…
• Our prediction is an interactive effect: high RC expectation (“only those”) will facilitate RC reading, but only in the extraposed condition
• We tested this in a self-paced reading study
Levy, Fedorenko, Breen, & Gibson (2012)
28Friday, May 17, 13
Experimental results
29Friday, May 17, 13
Experimental results
• We see the interaction!
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
penalty
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
• When more expected, it’s not
penalty
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
• When more expected, it’s not
penalty
no penalty
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
• When more expected, it’s not
• Alternatively, we can think of expectation as facilitating processing for extraposed variant
penalty
no penalty
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
• When more expected, it’s not
• Alternatively, we can think of expectation as facilitating processing for extraposed variant
penalty
no penaltyfacilitation
29Friday, May 17, 13
Experimental results
• We see the interaction!• When an RC is less
expected, the extraposed variant (executives←) is harder
• When more expected, it’s not
• Alternatively, we can think of expectation as facilitating processing for extraposed variant
penalty
no penaltyfacilitation
Interaction p’s ≤ 0.025
29Friday, May 17, 13
Experiment: Discussion• Increasing the expectation for an RC facilitates the
processing of extraposed RCs• True even though the extraposed RC is outside of the
continuous-constituent NP domain • The first real evidence that syntactic prediction is
extended beyond the domain of continuous constituents
• Why are (some kinds of) discontinuous constituents hard?• One possibility: locality & phrasal adjacency constraints• New possibility: driven by probabilistic expectations
30Friday, May 17, 13
Surprisal vs. predictability in general
• But is there evidence for surprisal as the specific function relating probability to processing difficulty?
31(Smith & Levy, in press)31Friday, May 17, 13
Surprisal vs. predictability in general
• But is there evidence for surprisal as the specific function relating probability to processing difficulty?
31(Smith & Levy, in press)31Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to…
(Smith & Levy, in press)32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to…
(Smith & Levy, in press)32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)
...play
(90% of the time)
32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)
...play
(90% of the time)
32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)
...play
(90% of the time)
32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)
...play
(90% of the time)
...play
(10% of the time)
32Friday, May 17, 13
Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in
reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories
32
the children went outside to… ...play?
(Smith & Levy, in press)
...play
(90% of the time)
...play
(10% of the time)
32Friday, May 17, 13
Proposed probability-time relationships• Logarithmic? • Theory 1: Optimal perceptual discrimination (“what is this
word?”; Stone, 1960; Laming, 1968; Norris, 2006)
33
# time steps elapsed
Post
erio
r pro
babi
lity
of c
orre
ct w
ord
0.0
0.2
0.4
0.6
0.8
1.0 Decision Threshold
1 bit
3 bits
5 bits7 bits
0 200 400 600 800 1000 1200
33Friday, May 17, 13
Theories of word-time relationship• Logarithmic?
• Theory 1: Optimal discrimination: “what is this word?” (Stone, 1960; Laming, 1968; Norris, 2006)
# time steps elapsed
Post
erio
r pro
babi
lity
of c
orre
ct w
ord
0.0
0.2
0.4
0.6
0.8
1.0 Decision Threshold
1 bit
3 bits
5 bits7 bits
0 200 400 600 800 1000 1200
Word surprisals
34Friday, May 17, 13
Theories of word-time relationship• Logarithmic?• Theory 2: highly incremental processing (Smith & Levy, in
press)
35
35Friday, May 17, 13
Theories of word-time relationship• Logarithmic?• Theory 2: highly incremental processing (Smith & Levy, in
press)
36
36Friday, May 17, 13
Other proposed probability-time rel’nships
37
37Friday, May 17, 13
Other proposed probability-time rel’nships
37
37Friday, May 17, 13
Other proposed probability-time rel’nships
37
37Friday, May 17, 13
Estimating probability/time curve shape
38Friday, May 17, 13
Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two
different methods: self-paced reading & eye-tracking
38Friday, May 17, 13
Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two
different methods: self-paced reading & eye-tracking• Challenge: we need big data to estimate curve shape, but
probability correlated with confounding variables
38Friday, May 17, 13
Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two
different methods: self-paced reading & eye-tracking• Challenge: we need big data to estimate curve shape, but
probability correlated with confounding variables
(5K words) (50K words)
38Friday, May 17, 13
Estimating probability/time curve shape• GAM regression:
total contribution of word (trigram) probability to RT near-linear over 6 orders of magnitude!
(Smith & Levy, in press)at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
Reading times in self-paced reading
Gaze durations in eye-tracking
39Friday, May 17, 13
Implications for different theories
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic
theory of UID• But UID could still be rescued by an
“optimal alignment with the speaker” view
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic
theory of UID• But UID could still be rescued by an
“optimal alignment with the speaker” view
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic
theory of UID• But UID could still be rescued by an
“optimal alignment with the speaker” view
• Good for theories based on:• Optimal perceptual discrimination• Highly incremental processing
40 at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
at
orig
Tota
l am
ount
of s
low
dow
n (m
s)0
2040
6080
10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)
40Friday, May 17, 13
Implications for predictability norming
41
41Friday, May 17, 13
Implications for higher-level processing• I discussed this test of surprisal in terms of lexical
predictability• But let’s revisit syntactic expectations
• We argued that extraposed who was difficult because of syntactic expectations
• Lexically, who is “unpredictable” in both cases, but extraposed who is many bits more surprising
42
42Friday, May 17, 13
Implications for higher-level processing• I discussed this test of surprisal in terms of lexical
predictability• But let’s revisit syntactic expectations
• We argued that extraposed who was difficult because of syntactic expectations
• Lexically, who is “unpredictable” in both cases, but extraposed who is many bits more surprising
42
42Friday, May 17, 13
Final case study: Cloze and linguistic experience
43
the children went outside to…
(Smith & Levy, 2011)43Friday, May 17, 13
Final case study: Cloze and linguistic experience
43
the children went outside to… play
(Smith & Levy, 2011)43Friday, May 17, 13
Final case study: Cloze and linguistic experience
43
the children went outside to… playeat
(Smith & Levy, 2011)43Friday, May 17, 13
Final case study: Cloze and linguistic experience
43
the children went outside to… playeat
play
(Smith & Levy, 2011)43Friday, May 17, 13
Final case study: Cloze and linguistic experience
43
the children went outside to…
play
playeat
play
(Smith & Levy, 2011)43Friday, May 17, 13
Final case study: Cloze and linguistic experience
43
the children went outside to…
play
playeat
play
(Smith & Levy, 2011)43Friday, May 17, 13
Cloze and linguistic experience
44
44Friday, May 17, 13
Cloze and linguistic experience
44
44Friday, May 17, 13
Cloze and linguistic experience• To understand the relationship among these, we want to
compare “ground truth” corpus probabilities to Cloze continuations
44
44Friday, May 17, 13
Cloze and linguistic experience• To understand the relationship among these, we want to
compare “ground truth” corpus probabilities to Cloze continuations
• Google Web n-grams
44
44Friday, May 17, 13
Cloze and linguistic experience• To understand the relationship among these, we want to
compare “ground truth” corpus probabilities to Cloze continuations
• Google Web n-grams• Google Books n-grams
44
44Friday, May 17, 13
Example contexts & method• Collect lots of completions to different contexts
• Fit a multivariate model to predict the completions
45
In the winter and ______It was no great ______He played a key ______The time needed to ______
45Friday, May 17, 13
Results
46
46Friday, May 17, 13
Results
47
47Friday, May 17, 13
What predicts reading times?• We ran a self-paced reading study with the same
materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not
48
48Friday, May 17, 13
What predicts reading times?• We ran a self-paced reading study with the same
materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not
48
48Friday, May 17, 13
What predicts reading times?• We ran a self-paced reading study with the same
materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not
48
48Friday, May 17, 13
What predicts reading times?• We ran a self-paced reading study with the same
materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not
48
48Friday, May 17, 13
What predicts reading times?• We ran a self-paced reading study with the same
materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not
48
48Friday, May 17, 13
What predicts reading times?• We ran a self-paced reading study with the same
materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not
48
48Friday, May 17, 13
General summary• Probabilistic grammars and surprisal theory unify
ambiguity resolution and prediction• Prediction takes into account rich syntactic & semantic
contexts• Striking quantitative support for surprisal as the right
index of incremental processing difficulty• Surprisal unifies grammatical expectations and lexical
predictability• The relationship between measured estimates of
linguistic experience and human prediction is non-trivial
49
49Friday, May 17, 13
Acknowledgments• Collaborators:
• Nathaniel Smith• Evelina Fedorenko• Mara Breen• Ted Gibson
• Funding• National Science Foundation• National Institutes of Health (NICHD)• Alfred P. Sloan Foundation
• UCSD Computational Psycholinguistics Lab
50
50Friday, May 17, 13
Thank you!
http://idiom.ucsd.edu/~rlevy
http://grammar.ucsd.edu/cpl51Friday, May 17, 13