46
Sentence Comprehension-I Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996

Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996

Embed Size (px)

Citation preview

Sentence Comprehension-I

Introduction and Jurafsky Model

Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996

What is Reading?

Process of accessing words? or What happens after all the words are

recognized? Analyze it Evaluate it Creating new knowledge Thinking guided by print?

Psycholinguistics

Issues in Language Processing

What is constructed when we comprehend a sentence? Propositional representation

What role do words play? Mental lexicon

How does the process of constructing representation occur?

How is the ability to construct a representation acquired?

Reading Time Experiments

Reading time: total time by a reader to read a peace of text

Reading time experiments Word by word reading Sentence by sentence reading Eye tracking experiment

Comprehension level of sentence (Reading time)

Measuring Reading Time with Eye Tracking

Source: https://wiki.brown.edu/confluence/display/kertzlab/Eye-Tracking+While+Reading

Eye Tracking While Reading (ETWR) Experiment

General Observation

Reading time is higher for ambiguous sentences

Modelling Reading Time

Probabilistic Context Free Grammar (PCFG) model

Entropy Reduction model

Competition-Integration model

Connectionist model

Modelling Reading Time

Probabilistic Context Free Grammar (PCFG) model

Entropy Reduction model

Competition-Integration model

Connectionist model

Disambiguation

Assumptions by Jurafsky (1996) Observed preferences in interpretation of

ambiguous sentences reflect probabilities of different syntactic structures.

Processing difficulty is a continuum ▪ Slight preferences at one end▪ Garden path constructs at another end

Several types of ambiguities Lexical category ambiguity Attachment ambiguity Unexpected thematic fit Main clause vs. reduced relative clause ambiguity

Garden Path Phenomena

Garden path: to be led down to the garden path To be misled or deceived

A garden path sentence [Wikipedia] Grammatically correct sentence Starts in such a way that reader’s most

likely interpretation will be incorrect Reader will be lured to a dead-end parse

Garden path sentences will be marked with # now on

Garden Path Example

(1)#The old man the boat.(2)#The horse raced(3)#The complex houses married and

single soldiers and their families.(4)#The government plans to raise

taxes were defeated.

past the barn

fell.

Lexical Category Ambiguity

Ambiguity resolved without trouble (fires = N or V):1a. The warehouse fires destroyed all the buildings1b. The warehouse fires a dozen employees each year.

Ambiguity leads to garden path (complex= N or Adj, houses= N or V, etc.):2a. #The complex houses married and single students.2b. #The old man the boats

Attachment Ambiguity

(1)The spy saw the policeman with binocular

(2)The spy saw the policeman with a revolver

(3)The bird saw the birdwatcher with binocular

Attachment Ambiguity

Prepositional phrase can attach to NP or VP.

1. I saw the man with the glasses

Subcategorization Frames

The arguments required by a verb are its subcategorization frame or valence.

Primary arguments Secondary arguments

Attachment preferences vary between verbs

(1)The women discussed the dogs on the beach.a. The women discussed the dogs which were on the beach.

(90%)b. The women discussed them (the dogs) while on the beach.

(10%)

(2)The women kept the dogs on the beach.a. The women kept the dogs which were on the beach. (5%)b. The women kept them (the dogs) on the beach. (95%)

Unexpected Thematic Fit

(1)The cop arrested by the detective was guilty of taking bribes

(2)The crook arrested by the detective was guilty of taking bribes

(1) introduces more disambiguation difficulty as the initial noun phrase (the cop) is a good agent of the first verb (arrested)

Main Clause vs. Reduced Relative Clause

Reduced relative clause: that-clause without the that.a. #The horse raced past the barn fell.a’. #The horse (that) raced past the barn fell.b. The horse found in the woods diedb’. The horse (that was) found in the woods died

Another case of different subcategorization preferences: X raced >> X raced Y (intransitive preferred over

transitive)

X found Y >> X found (transitive preferred over intransitive)

Serial Parsing Model

if multiple rules can apply, choose one based on a selection rule determinism

example selection rule: minimal attachment (choose the tree with the fewest nodes).

if parse fails, backtrack to choice point and reparse

backtracking occurs, causes increased processing times.

Parallel Parsing Model

if multiple rules can apply, pursue all possibilities in parallel non-determinism

if any parse fails, discard it; problem: number of parse trees can grow

exponentially solution: only pursue a limited number of

possibilities (bounded parallelism). Prune some of the unpromising parses

garden path means correct tree was pruned from search space;

backtracking occurs, causes increased processing times.

A Probabilistic Parallel Parser

How to model non-determinism Probabilistic parsing

Parsing model [Jurafsky (1996)] Each full or partial parse is assigned a

probability. Parses are pruned from the search space

if their probability is a factor of α below the most probable parse (beam search).

How are parse probabilities determined?

Computing Parse Probabilities

Jurafsky (1996) focuses on two sources of information: Construction probabilities: probability of syntactic

tree. Valence probabilities: probability of particular

syntactic categories as arguments for specific verbs Assumes that construction probabilities and

valence probabilities are independent, so

can be estimated from a large treebank using relative frequencies.

Grammar Language

Grammar for a language is a set of rewrite rules𝐴→𝛼𝐵𝐶

Non-terminal Symbol

Terminal Symbol

𝛼 𝐴 𝛽→𝛾 𝐵𝐶

Context-Free

Context-Sensitive

Probabilistic Context Free Grammar (PCFG)

is computed as. Probabilistic Context Free Grammar (PCFG)

A PCFG is a probabilistic version of a CFG where each production has a probability.▪ [p]

Probabilities of all productions rewriting a given non-terminal must add to 1, defining a distribution for each non-terminal.

String generation is now probabilistic where production probabilities are used to non-deterministically select a production for rewriting a given non-terminal.

Probabilistic Context Free Grammar (PCFG)

A PCFG a set of non-terminals a set of terminal symbols a set of production rules

PCFG Example

PCFG Example

PCFG Example

Sentence probability: sum of probabilities of all of its derivations

PCFG Normal Forms

Chomsky Normal Form If a PCFG G is - Productions: or Conversion to CNF

▪ will be converted to

Simple PCFG

S → NP VP S → Aux NP VP S → VP NP → PronounNP → Proper-NounNP → Det NominalNominal → NounNominal → Nominal NounNominal → Nominal PPVP → VerbVP → Verb NPVP → VP PPPP → Prep NP

Grammar0.80.10.10.20.20.60.30.20.50.20.50.31.0

Prob

+

+

+

+

1.0

1.0

1.0

1.0

Det → the | a | that | this 0.6 0.2 0.1 0.1Noun → book | flight | meal | money 0.1 0.5 0.2 0.2Verb → book | include | prefer 0.5 0.2 0.3Pronoun → I | he | she | me 0.5 0.1 0.1 0.3Proper-Noun → Houston | NWA 0.8 0.2Aux → does 1.0Prep → from | to | on | near | through 0.25 0.25 0.1 0.2 0.2

Lexicon

Probabilistic Grammar Conversion

S → NP VPS → Aux NP VP

S → VP

NP → Pronoun

NP → Proper-Noun

NP → Det NominalNominal → Noun

Nominal → Nominal NounNominal → Nominal PPVP → Verb

VP → Verb NPVP → VP PPPP → Prep NP

Original Grammar Chomsky Normal FormS → NP VPS → X1 VPX1 → Aux NPS → book | include | prefer 0.01 0.004 0.006S → Verb NPS → VP PPNP → I | he | she | me 0.01 0.02 0.02 0.06NP → Houston | NWA 0.16 .04NP → Det NominalNominal → book | flight | meal | money 0.03 0.15 0.06 0.06Nominal → Nominal NounNominal → Nominal PPVP → book | include | prefer 0.1 0.04 0.06VP → Verb NPVP → VP PPPP → Prep NP

0.80.1

0.1

0.2

0.2

0.60.3

0.20.50.2

0.50.31.0

0.80.11.0

0.050.03

0.6

0.20.5

0.50.31.0

PCFG Removing Left Recursion

Valence Probabilities

Subcategorization frames of the verb keep:

Valence probabilities tell us how likely each of these frames is.

Valence Probabilities

Like PCFG probabilities, valence probabilities are estimated from a treebank.

Modeling Garden Path Effects

Garden path caused by construction probabilities

Modeling Garden Path Effects

Garden path caused by construction probabilities

Modeling Disambiguation

Disambiguation using construction probabilities, no garden path“The warehouse fires destroyed all the

buildings”

Modeling Disambiguation

Disambiguation using construction probabilities, no garden path“The warehouse fires a dozen employees each

year.”

Modeling Valence Preferences

Disambiguation using valence probabilities, no garden path: “keep the dogs on the

beach”

Modeling Valence Preferences

Disambiguation using valence probabilities, no garden path: “keep the dogs on the

beach”

Modeling Valence Preferences

Disambiguation using valence probabilities, no garden path: “discuss the dogs on the

beach”

Modeling Valence Preferences

Disambiguation using valence probabilities, no garden path: “discuss the dogs on the

beach”

Combining Valence and Construction Probabilities

Garden path caused by construction probabilities and valenceprobabilities: (main verb interpretation)“the horse raced

past……”

Combining Valence and Construction Probabilities

Garden path caused by construction probabilities and valenceprobabilities: (reduced relative interpretation)“the horse raced

past……”

Combining Valence and Construction Probabilities

Disambiguation using construction probabilities and valenceprobabilities, no garden path: (main verb)“The bird found in the room died”

Combining Valence and Construction Probabilities

Disambiguation using construction probabilities and valenceprobabilities, no garden path: (reduced relative)“The bird found in the room died”

Setting the Beam Width

Crucial assumption: if the relative probability of a tree falls below a certain value, then it will be pruned.

Assumption: a garden path occurs if the probability ratio is higher than 5:1.