Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
1LING 180 Autumn 2007
LINGUIST 180Introduction to Computational Linguistics
Lecture 10 Grammar and Parsing (II)October 25 2007
Dan Jurafsky
Thanks to Jim Martin for many of these slides
2LING 180 Autumn 2007
Grammars and Parsing
Context-Free Grammars and ConstituencySome common CFG phenomena for EnglishBaby parsers Top-down and Bottom-up ParsingToday Real parsers Dynamic Programming parsing
CKY
Probabilistic parsingOptional section the Earley algorithm
3LING 180 Autumn 2007
Dynamic Programming
We need a method that fills a table with partial resultsthat
Does not do (avoidable) repeated workDoes not fall prey to left-recursionCan find all the pieces of an exponential number oftrees in polynomial time
Two popular methodsCKYEarley
4LING 180 Autumn 2007
The CKY (Cocke-Kasami-Younger)Algorithm
Requires the grammar be in Chomsky Normal Form(CNF)
All rules must be in following formndash A -gt B Cndash A -gt w
Any grammar can be converted automatically toChomsky Normal Form
5LING 180 Autumn 2007
Converting to CNF
Rules that mix terminals and non-terminalsIntroduce a new dummy non-terminal that covers theterminal
ndash INFVP -gt to VP replaced by
ndash INFVP -gt TO VPndash TO -gt to
Rules that have a single non-terminal on right (ldquounitproductionsrdquo)
Rewrite each unit production with the RHS of theirexpansions
Rules whose right hand side length gt2Introduce dummy non-terminals that spread theright-hand side
6LING 180 Autumn 2007
Automatic Conversion to CNF
2
7LING 180 Autumn 2007
Sample Grammar
8LING 180 Autumn 2007
Back to CKY Parsing
Given rules in CNFConsider the rule A -gt BC
If there is an A in the input then there must be a Bfollowed by a C in the inputIf the A goes from i to j in the input then there mustbe some k st iltkltj
ndash Ie The B splits from the C someplace
9LING 180 Autumn 2007
CKY
So letrsquos build a table so that an A spanning from i to jin the input is placed in cell [ij] in the tableSo a non-terminal spanning an entire string will sit incell [0 n]If we build the table bottom up wersquoll know that theparts of the A must go from i to k and from k to j
10LING 180 Autumn 2007
CKY
Meaning that for a rule like A -gt B C we should lookfor a B in [ik] and a C in [kj]In other words if we think there might be an Aspanning ij in the inputhellip ANDA -gt B C is a rule in the grammar THENThere must be a B in [ik] and a C in [kj] for someiltkltjSo just loop over the possible k values
11LING 180 Autumn 2007
CKY Table
bullFilling the[ij]th cell inthe CKY table
12LING 180 Autumn 2007
CKY Algorithm
3
13LING 180 Autumn 2007
Note
We arranged the loops to fill the table a column at atime from left to right bottom to top
This assures us that whenever wersquore filling a cell theparts needed to fill it are already in the table (to theleft and below)Are there other ways to fill the table
14LING 180 Autumn 2007
0 Book 1 the 2 flight 3 through 4 Houston 5
15LING 180 Autumn 2007
CYK Example
S -gt NP VPVP -gt V NPNP -gt NP PPVP -gt VP PPPP -gt P NP
NP -gt John Mary DenverV -gt calledP -gt from
16LING 180 Autumn 2007
Example
John called Mary from Denver
S
VP PP
NP VP
V NP NPP
17LING 180 Autumn 2007
Example
John called Mary from Denver
S
PP
NP VP
NP
NP
V
18LING 180 Autumn 2007
Example
NP
John
calledNP
MaryV
fromNP
DenverP
4
19LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNP
DenverP
20LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverP
21LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverPX
22LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVP
DenverPX
23LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryV
fromNPVPS
DenverPX
24LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVPS
DenverPXX
5
25LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPX
26LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
27LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
28LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
29LING 180 Autumn 2007
Example
NPPPNPVP1VP2
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
30LING 180 Autumn 2007
Example
NPPPNPVP1VP2
S
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
6
31LING 180 Autumn 2007
Example
NPPPNPVPS
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
32LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
33LING 180 Autumn 2007
Ambiguity
34LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
35LING 180 Autumn 2007
Converting CKY from Recognizerto Parser
With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from
36LING 180 Autumn 2007
How to do parse disambiguation
Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
2
7LING 180 Autumn 2007
Sample Grammar
8LING 180 Autumn 2007
Back to CKY Parsing
Given rules in CNFConsider the rule A -gt BC
If there is an A in the input then there must be a Bfollowed by a C in the inputIf the A goes from i to j in the input then there mustbe some k st iltkltj
ndash Ie The B splits from the C someplace
9LING 180 Autumn 2007
CKY
So letrsquos build a table so that an A spanning from i to jin the input is placed in cell [ij] in the tableSo a non-terminal spanning an entire string will sit incell [0 n]If we build the table bottom up wersquoll know that theparts of the A must go from i to k and from k to j
10LING 180 Autumn 2007
CKY
Meaning that for a rule like A -gt B C we should lookfor a B in [ik] and a C in [kj]In other words if we think there might be an Aspanning ij in the inputhellip ANDA -gt B C is a rule in the grammar THENThere must be a B in [ik] and a C in [kj] for someiltkltjSo just loop over the possible k values
11LING 180 Autumn 2007
CKY Table
bullFilling the[ij]th cell inthe CKY table
12LING 180 Autumn 2007
CKY Algorithm
3
13LING 180 Autumn 2007
Note
We arranged the loops to fill the table a column at atime from left to right bottom to top
This assures us that whenever wersquore filling a cell theparts needed to fill it are already in the table (to theleft and below)Are there other ways to fill the table
14LING 180 Autumn 2007
0 Book 1 the 2 flight 3 through 4 Houston 5
15LING 180 Autumn 2007
CYK Example
S -gt NP VPVP -gt V NPNP -gt NP PPVP -gt VP PPPP -gt P NP
NP -gt John Mary DenverV -gt calledP -gt from
16LING 180 Autumn 2007
Example
John called Mary from Denver
S
VP PP
NP VP
V NP NPP
17LING 180 Autumn 2007
Example
John called Mary from Denver
S
PP
NP VP
NP
NP
V
18LING 180 Autumn 2007
Example
NP
John
calledNP
MaryV
fromNP
DenverP
4
19LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNP
DenverP
20LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverP
21LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverPX
22LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVP
DenverPX
23LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryV
fromNPVPS
DenverPX
24LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVPS
DenverPXX
5
25LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPX
26LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
27LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
28LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
29LING 180 Autumn 2007
Example
NPPPNPVP1VP2
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
30LING 180 Autumn 2007
Example
NPPPNPVP1VP2
S
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
6
31LING 180 Autumn 2007
Example
NPPPNPVPS
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
32LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
33LING 180 Autumn 2007
Ambiguity
34LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
35LING 180 Autumn 2007
Converting CKY from Recognizerto Parser
With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from
36LING 180 Autumn 2007
How to do parse disambiguation
Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
3
13LING 180 Autumn 2007
Note
We arranged the loops to fill the table a column at atime from left to right bottom to top
This assures us that whenever wersquore filling a cell theparts needed to fill it are already in the table (to theleft and below)Are there other ways to fill the table
14LING 180 Autumn 2007
0 Book 1 the 2 flight 3 through 4 Houston 5
15LING 180 Autumn 2007
CYK Example
S -gt NP VPVP -gt V NPNP -gt NP PPVP -gt VP PPPP -gt P NP
NP -gt John Mary DenverV -gt calledP -gt from
16LING 180 Autumn 2007
Example
John called Mary from Denver
S
VP PP
NP VP
V NP NPP
17LING 180 Autumn 2007
Example
John called Mary from Denver
S
PP
NP VP
NP
NP
V
18LING 180 Autumn 2007
Example
NP
John
calledNP
MaryV
fromNP
DenverP
4
19LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNP
DenverP
20LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverP
21LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverPX
22LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVP
DenverPX
23LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryV
fromNPVPS
DenverPX
24LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVPS
DenverPXX
5
25LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPX
26LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
27LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
28LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
29LING 180 Autumn 2007
Example
NPPPNPVP1VP2
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
30LING 180 Autumn 2007
Example
NPPPNPVP1VP2
S
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
6
31LING 180 Autumn 2007
Example
NPPPNPVPS
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
32LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
33LING 180 Autumn 2007
Ambiguity
34LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
35LING 180 Autumn 2007
Converting CKY from Recognizerto Parser
With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from
36LING 180 Autumn 2007
How to do parse disambiguation
Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
4
19LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNP
DenverP
20LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverP
21LING 180 Autumn 2007
Example
NP
John
calledNP
MaryVX
fromNPVP
DenverPX
22LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVP
DenverPX
23LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryV
fromNPVPS
DenverPX
24LING 180 Autumn 2007
Example
NPPP
John
calledNP
MaryVX
fromNPVPS
DenverPXX
5
25LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPX
26LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
27LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
28LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
29LING 180 Autumn 2007
Example
NPPPNPVP1VP2
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
30LING 180 Autumn 2007
Example
NPPPNPVP1VP2
S
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
6
31LING 180 Autumn 2007
Example
NPPPNPVPS
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
32LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
33LING 180 Autumn 2007
Ambiguity
34LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
35LING 180 Autumn 2007
Converting CKY from Recognizerto Parser
With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from
36LING 180 Autumn 2007
How to do parse disambiguation
Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
5
25LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPX
26LING 180 Autumn 2007
Example
NPPPNP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
27LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
28LING 180 Autumn 2007
Example
NPPPNPVP
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
29LING 180 Autumn 2007
Example
NPPPNPVP1VP2
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
30LING 180 Autumn 2007
Example
NPPPNPVP1VP2
S
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
6
31LING 180 Autumn 2007
Example
NPPPNPVPS
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
32LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
33LING 180 Autumn 2007
Ambiguity
34LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
35LING 180 Autumn 2007
Converting CKY from Recognizerto Parser
With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from
36LING 180 Autumn 2007
How to do parse disambiguation
Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
6
31LING 180 Autumn 2007
Example
NPPPNPVPS
John
calledNP
MaryVX
fromNPVPS
DenverPXXX
32LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
33LING 180 Autumn 2007
Ambiguity
34LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
35LING 180 Autumn 2007
Converting CKY from Recognizerto Parser
With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from
36LING 180 Autumn 2007
How to do parse disambiguation
Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
7
37LING 180 Autumn 2007
Probabilistic CFGs
The probabilistic modelAssigning probabilities to parse trees
Getting the probabilities for the modelParsing with probabilities
Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input
38LING 180 Autumn 2007
Probability Model
Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05
Read this as P(Specific rule | LHS)
39LING 180 Autumn 2007
PCFG
40LING 180 Autumn 2007
PCFG
41LING 180 Autumn 2007
Probability Model (1)
A derivation (tree) consists of the set of grammarrules that are in the tree
The probability of a tree is just the product of theprobabilities of the rules in the derivation
42LING 180 Autumn 2007
Probability model
P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1
P(TS) = p(rn )nT
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
8
43LING 180 Autumn 2007
Probability Model (11)
The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case
44LING 180 Autumn 2007
Getting the Probabilities
From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall
45LING 180 Autumn 2007
TreeBanks
46LING 180 Autumn 2007
Treebanks
47LING 180 Autumn 2007
Treebanks
48LING 180 Autumn 2007
Treebank Grammars
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
9
49LING 180 Autumn 2007
Lots of flat rules
50LING 180 Autumn 2007
Example sentences from thoserules
Total over 17000 different grammar rules in the 1-million word Treebank corpus
51LING 180 Autumn 2007
Probabilistic GrammarAssumptions
Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically
52LING 180 Autumn 2007
Typical Approach
Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up
53LING 180 Autumn 2007
Whatrsquos that last bullet mean
Say wersquore talking about a final part of a parseS-gt0NPiVPj
The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)
The green stuff is already known Wersquore doing bottom-up parsing
54LING 180 Autumn 2007
Max
I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
10
55LING 180 Autumn 2007
CKY Where does the max go
56LING 180 Autumn 2007
Problems with PCFGs
The probability model wersquore using is just based on therules in the derivationhellip
Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used
57LING 180 Autumn 2007
Solution
Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords
58LING 180 Autumn 2007
Heads
To do that wersquore going to make use of the notion ofthe head of a phrase
The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition
(Itrsquos really more complicated than that but this will do)
59LING 180 Autumn 2007
Example (right)
Attribute grammar
60LING 180 Autumn 2007
Example (wrong)
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
11
61LING 180 Autumn 2007
How
We used to haveVP -gt V NP PP P(rule|VP)
ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank
Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank
62LING 180 Autumn 2007
Declare Independence
When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things
Verb subcategorizationndash Particular verbs have affinities for particular VPs
Objects affinities for their predicates (mostly theirmothers and grandmothers)
ndash Some objects fit better with some predicates thanothers
63LING 180 Autumn 2007
Subcategorization
Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes
P(r | VP ^ dumped)
Whatrsquos the countHow many times was this rule used with (head) dump
divided by the number of VPs that dump appears (ashead) in total
64LING 180 Autumn 2007
Preferences
Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip
65LING 180 Autumn 2007
Example (right)
66LING 180 Autumn 2007
Example (wrong)
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
12
67LING 180 Autumn 2007
Preferences
The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter
68LING 180 Autumn 2007
Preferences (2)
Consider the VPsAte spaghetti with gustoAte spaghetti with marinara
The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate
69LING 180 Autumn 2007
Preferences (2)
Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs
Vp (ate) Vp(ate)
Vp(ate) Pp(with)Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
70LING 180 Autumn 2007
Summary
Context-Free GrammarsParsing
Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley
DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks
71LING 180 Autumn 2007
Optional section the Earley alg
72LING 180 Autumn 2007
Problem (minor)
We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
13
73LING 180 Autumn 2007
Earley Parsing
Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down
Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent
ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents
74LING 180 Autumn 2007
States
The table-entries are called states and are representedwith dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
75LING 180 Autumn 2007
StatesLocations
It would be nice to know where these things are in the inputsohellip
S -gt VP [00] A VP is predicted at the start of the sentence
NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2
VP -gt V NP [03] A VP has been found starting at 0 and ending at 3
76LING 180 Autumn 2007
Graphically
77LING 180 Autumn 2007
Earley
As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
78LING 180 Autumn 2007
Earley Algorithm
March through chart left-to-rightAt each step apply 1 of 3 operators
Predictorndash Create new states representing top-down expectations
Scannerndash Match word predictions (rule with word after dot) to
words
Completerndash When a state is complete see what rules were looking
for that completed constituent
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
14
79LING 180 Autumn 2007
Predictor
Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at
ndash S -gt VP [00]
results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]
80LING 180 Autumn 2007
Scanner
Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at
ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state
ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart
81LING 180 Autumn 2007
Completer
Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category
copy state move dot insert in current chart entryGiven
NP -gt Det Nominal [13]VP -gt Verb NP [01]
AddVP -gt Verb NP [03]
82LING 180 Autumn 2007
Earley how do we know we aredone
How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done
S ndashgt α [0n+1]
83LING 180 Autumn 2007
Earley
So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way
84LING 180 Autumn 2007
Earley
More specificallyhellip1 Predict all the states you can upfront2 Read a word
1 Extend states based on matches2 Add new predictions3 Go to 2
3 Look at N+1 to see if you have a winner
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
15
85LING 180 Autumn 2007
Example
Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip
86LING 180 Autumn 2007
Example
87LING 180 Autumn 2007
Example
88LING 180 Autumn 2007
Example
89LING 180 Autumn 2007
Details
What kind of algorithms did we just describe (bothEarley and CKY)
Not parsers ndash recognizersndash The presence of an S state with the right attributes in
the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
90LING 180 Autumn 2007
Back to Ambiguity
Did we solve it
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently
16
91LING 180 Autumn 2007
Ambiguity
92LING 180 Autumn 2007
Ambiguity
NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer
ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition
ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in
polynomial time
93LING 180 Autumn 2007
Converting Earley fromRecognizer to Parser
With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom
94LING 180 Autumn 2007
Augmenting the chart withstructural information
S8S9
S10
S11
S13S12
S8
S9
S8
95LING 180 Autumn 2007
Retrieving Parse Trees from Chart
All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently