16
1 1 LING 180 Autumn 2007 LINGUIST 180 Introduction to Computational Linguistics Lecture 10: Grammar and Parsing (II) October 25, 2007 Dan Jurafsky Thanks to Jim Martin for many of these slides! 2 LING 180 Autumn 2007 Grammars and Parsing Context-Free Grammars and Constituency Some common CFG phenomena for English Baby parsers: Top-down and Bottom-up Parsing Today: Real parsers: Dynamic Programming parsing CKY Probabilistic parsing Optional section: the Earley algorithm 3 LING 180 Autumn 2007 Dynamic Programming We need a method that fills a table with partial results that Does not do (avoidable) repeated work Does not fall prey to left-recursion Can find all the pieces of an exponential number of trees in polynomial time. Two popular methods CKY Earley 4 LING 180 Autumn 2007 The CKY (Cocke-Kasami-Younger) Algorithm Requires the grammar be in Chomsky Normal Form (CNF) All rules must be in following form: – A -> B C – A -> w Any grammar can be converted automatically to Chomsky Normal Form 5 LING 180 Autumn 2007 Converting to CNF Rules that mix terminals and non-terminals Introduce a new dummy non-terminal that covers the terminal – INFVP -> to VP replaced by: – INFVP -> TO VP – TO -> to Rules that have a single non-terminal on right (“unit productions”) Rewrite each unit production with the RHS of their expansions Rules whose right hand side length >2 Introduce dummy non-terminals that spread the right-hand side 6 LING 180 Autumn 2007 Automatic Conversion to CNF

Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

1

1LING 180 Autumn 2007

LINGUIST 180Introduction to Computational Linguistics

Lecture 10 Grammar and Parsing (II)October 25 2007

Dan Jurafsky

Thanks to Jim Martin for many of these slides

2LING 180 Autumn 2007

Grammars and Parsing

Context-Free Grammars and ConstituencySome common CFG phenomena for EnglishBaby parsers Top-down and Bottom-up ParsingToday Real parsers Dynamic Programming parsing

CKY

Probabilistic parsingOptional section the Earley algorithm

3LING 180 Autumn 2007

Dynamic Programming

We need a method that fills a table with partial resultsthat

Does not do (avoidable) repeated workDoes not fall prey to left-recursionCan find all the pieces of an exponential number oftrees in polynomial time

Two popular methodsCKYEarley

4LING 180 Autumn 2007

The CKY (Cocke-Kasami-Younger)Algorithm

Requires the grammar be in Chomsky Normal Form(CNF)

All rules must be in following formndash A -gt B Cndash A -gt w

Any grammar can be converted automatically toChomsky Normal Form

5LING 180 Autumn 2007

Converting to CNF

Rules that mix terminals and non-terminalsIntroduce a new dummy non-terminal that covers theterminal

ndash INFVP -gt to VP replaced by

ndash INFVP -gt TO VPndash TO -gt to

Rules that have a single non-terminal on right (ldquounitproductionsrdquo)

Rewrite each unit production with the RHS of theirexpansions

Rules whose right hand side length gt2Introduce dummy non-terminals that spread theright-hand side

6LING 180 Autumn 2007

Automatic Conversion to CNF

2

7LING 180 Autumn 2007

Sample Grammar

8LING 180 Autumn 2007

Back to CKY Parsing

Given rules in CNFConsider the rule A -gt BC

If there is an A in the input then there must be a Bfollowed by a C in the inputIf the A goes from i to j in the input then there mustbe some k st iltkltj

ndash Ie The B splits from the C someplace

9LING 180 Autumn 2007

CKY

So letrsquos build a table so that an A spanning from i to jin the input is placed in cell [ij] in the tableSo a non-terminal spanning an entire string will sit incell [0 n]If we build the table bottom up wersquoll know that theparts of the A must go from i to k and from k to j

10LING 180 Autumn 2007

CKY

Meaning that for a rule like A -gt B C we should lookfor a B in [ik] and a C in [kj]In other words if we think there might be an Aspanning ij in the inputhellip ANDA -gt B C is a rule in the grammar THENThere must be a B in [ik] and a C in [kj] for someiltkltjSo just loop over the possible k values

11LING 180 Autumn 2007

CKY Table

bullFilling the[ij]th cell inthe CKY table

12LING 180 Autumn 2007

CKY Algorithm

3

13LING 180 Autumn 2007

Note

We arranged the loops to fill the table a column at atime from left to right bottom to top

This assures us that whenever wersquore filling a cell theparts needed to fill it are already in the table (to theleft and below)Are there other ways to fill the table

14LING 180 Autumn 2007

0 Book 1 the 2 flight 3 through 4 Houston 5

15LING 180 Autumn 2007

CYK Example

S -gt NP VPVP -gt V NPNP -gt NP PPVP -gt VP PPPP -gt P NP

NP -gt John Mary DenverV -gt calledP -gt from

16LING 180 Autumn 2007

Example

John called Mary from Denver

S

VP PP

NP VP

V NP NPP

17LING 180 Autumn 2007

Example

John called Mary from Denver

S

PP

NP VP

NP

NP

V

18LING 180 Autumn 2007

Example

NP

John

calledNP

MaryV

fromNP

DenverP

4

19LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNP

DenverP

20LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverP

21LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverPX

22LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVP

DenverPX

23LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryV

fromNPVPS

DenverPX

24LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVPS

DenverPXX

5

25LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPX

26LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

27LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

28LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

29LING 180 Autumn 2007

Example

NPPPNPVP1VP2

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

30LING 180 Autumn 2007

Example

NPPPNPVP1VP2

S

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

6

31LING 180 Autumn 2007

Example

NPPPNPVPS

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

32LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

33LING 180 Autumn 2007

Ambiguity

34LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

35LING 180 Autumn 2007

Converting CKY from Recognizerto Parser

With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from

36LING 180 Autumn 2007

How to do parse disambiguation

Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 2: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

2

7LING 180 Autumn 2007

Sample Grammar

8LING 180 Autumn 2007

Back to CKY Parsing

Given rules in CNFConsider the rule A -gt BC

If there is an A in the input then there must be a Bfollowed by a C in the inputIf the A goes from i to j in the input then there mustbe some k st iltkltj

ndash Ie The B splits from the C someplace

9LING 180 Autumn 2007

CKY

So letrsquos build a table so that an A spanning from i to jin the input is placed in cell [ij] in the tableSo a non-terminal spanning an entire string will sit incell [0 n]If we build the table bottom up wersquoll know that theparts of the A must go from i to k and from k to j

10LING 180 Autumn 2007

CKY

Meaning that for a rule like A -gt B C we should lookfor a B in [ik] and a C in [kj]In other words if we think there might be an Aspanning ij in the inputhellip ANDA -gt B C is a rule in the grammar THENThere must be a B in [ik] and a C in [kj] for someiltkltjSo just loop over the possible k values

11LING 180 Autumn 2007

CKY Table

bullFilling the[ij]th cell inthe CKY table

12LING 180 Autumn 2007

CKY Algorithm

3

13LING 180 Autumn 2007

Note

We arranged the loops to fill the table a column at atime from left to right bottom to top

This assures us that whenever wersquore filling a cell theparts needed to fill it are already in the table (to theleft and below)Are there other ways to fill the table

14LING 180 Autumn 2007

0 Book 1 the 2 flight 3 through 4 Houston 5

15LING 180 Autumn 2007

CYK Example

S -gt NP VPVP -gt V NPNP -gt NP PPVP -gt VP PPPP -gt P NP

NP -gt John Mary DenverV -gt calledP -gt from

16LING 180 Autumn 2007

Example

John called Mary from Denver

S

VP PP

NP VP

V NP NPP

17LING 180 Autumn 2007

Example

John called Mary from Denver

S

PP

NP VP

NP

NP

V

18LING 180 Autumn 2007

Example

NP

John

calledNP

MaryV

fromNP

DenverP

4

19LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNP

DenverP

20LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverP

21LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverPX

22LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVP

DenverPX

23LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryV

fromNPVPS

DenverPX

24LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVPS

DenverPXX

5

25LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPX

26LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

27LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

28LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

29LING 180 Autumn 2007

Example

NPPPNPVP1VP2

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

30LING 180 Autumn 2007

Example

NPPPNPVP1VP2

S

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

6

31LING 180 Autumn 2007

Example

NPPPNPVPS

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

32LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

33LING 180 Autumn 2007

Ambiguity

34LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

35LING 180 Autumn 2007

Converting CKY from Recognizerto Parser

With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from

36LING 180 Autumn 2007

How to do parse disambiguation

Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 3: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

3

13LING 180 Autumn 2007

Note

We arranged the loops to fill the table a column at atime from left to right bottom to top

This assures us that whenever wersquore filling a cell theparts needed to fill it are already in the table (to theleft and below)Are there other ways to fill the table

14LING 180 Autumn 2007

0 Book 1 the 2 flight 3 through 4 Houston 5

15LING 180 Autumn 2007

CYK Example

S -gt NP VPVP -gt V NPNP -gt NP PPVP -gt VP PPPP -gt P NP

NP -gt John Mary DenverV -gt calledP -gt from

16LING 180 Autumn 2007

Example

John called Mary from Denver

S

VP PP

NP VP

V NP NPP

17LING 180 Autumn 2007

Example

John called Mary from Denver

S

PP

NP VP

NP

NP

V

18LING 180 Autumn 2007

Example

NP

John

calledNP

MaryV

fromNP

DenverP

4

19LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNP

DenverP

20LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverP

21LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverPX

22LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVP

DenverPX

23LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryV

fromNPVPS

DenverPX

24LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVPS

DenverPXX

5

25LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPX

26LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

27LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

28LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

29LING 180 Autumn 2007

Example

NPPPNPVP1VP2

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

30LING 180 Autumn 2007

Example

NPPPNPVP1VP2

S

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

6

31LING 180 Autumn 2007

Example

NPPPNPVPS

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

32LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

33LING 180 Autumn 2007

Ambiguity

34LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

35LING 180 Autumn 2007

Converting CKY from Recognizerto Parser

With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from

36LING 180 Autumn 2007

How to do parse disambiguation

Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 4: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

4

19LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNP

DenverP

20LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverP

21LING 180 Autumn 2007

Example

NP

John

calledNP

MaryVX

fromNPVP

DenverPX

22LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVP

DenverPX

23LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryV

fromNPVPS

DenverPX

24LING 180 Autumn 2007

Example

NPPP

John

calledNP

MaryVX

fromNPVPS

DenverPXX

5

25LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPX

26LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

27LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

28LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

29LING 180 Autumn 2007

Example

NPPPNPVP1VP2

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

30LING 180 Autumn 2007

Example

NPPPNPVP1VP2

S

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

6

31LING 180 Autumn 2007

Example

NPPPNPVPS

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

32LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

33LING 180 Autumn 2007

Ambiguity

34LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

35LING 180 Autumn 2007

Converting CKY from Recognizerto Parser

With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from

36LING 180 Autumn 2007

How to do parse disambiguation

Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 5: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

5

25LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPX

26LING 180 Autumn 2007

Example

NPPPNP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

27LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

28LING 180 Autumn 2007

Example

NPPPNPVP

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

29LING 180 Autumn 2007

Example

NPPPNPVP1VP2

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

30LING 180 Autumn 2007

Example

NPPPNPVP1VP2

S

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

6

31LING 180 Autumn 2007

Example

NPPPNPVPS

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

32LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

33LING 180 Autumn 2007

Ambiguity

34LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

35LING 180 Autumn 2007

Converting CKY from Recognizerto Parser

With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from

36LING 180 Autumn 2007

How to do parse disambiguation

Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 6: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

6

31LING 180 Autumn 2007

Example

NPPPNPVPS

John

calledNP

MaryVX

fromNPVPS

DenverPXXX

32LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

33LING 180 Autumn 2007

Ambiguity

34LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

35LING 180 Autumn 2007

Converting CKY from Recognizerto Parser

With the addition of a few pointers we have a parserAugment each new cell in chart to point to where wecame from

36LING 180 Autumn 2007

How to do parse disambiguation

Probabilistic methodsAugment the grammar with probabilitiesThen modify the parser to keep only most probableparsesAnd at the end return the most probable parse

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 7: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

7

37LING 180 Autumn 2007

Probabilistic CFGs

The probabilistic modelAssigning probabilities to parse trees

Getting the probabilities for the modelParsing with probabilities

Slight modification to dynamic programmingapproachTask is to find the max probability tree for an input

38LING 180 Autumn 2007

Probability Model

Attach probabilities to grammar rulesThe expansions for a given non-terminal sum to 1VP -gt Verb 55VP -gt Verb NP 40VP -gt Verb NP NP 05

Read this as P(Specific rule | LHS)

39LING 180 Autumn 2007

PCFG

40LING 180 Autumn 2007

PCFG

41LING 180 Autumn 2007

Probability Model (1)

A derivation (tree) consists of the set of grammarrules that are in the tree

The probability of a tree is just the product of theprobabilities of the rules in the derivation

42LING 180 Autumn 2007

Probability model

P(TS) = P(T)P(S|T) = P(T) since P(S|T)=1

P(TS) = p(rn )nT

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 8: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

8

43LING 180 Autumn 2007

Probability Model (11)

The probability of a word sequence P(S) is theprobability of its tree in the unambiguous caseItrsquos the sum of the probabilities of the trees in theambiguous case

44LING 180 Autumn 2007

Getting the Probabilities

From an annotated database (a treebank)So for example to get the probability for a particularVP rule just count all the times the rule is used anddivide by the number of VPs overall

45LING 180 Autumn 2007

TreeBanks

46LING 180 Autumn 2007

Treebanks

47LING 180 Autumn 2007

Treebanks

48LING 180 Autumn 2007

Treebank Grammars

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 9: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

9

49LING 180 Autumn 2007

Lots of flat rules

50LING 180 Autumn 2007

Example sentences from thoserules

Total over 17000 different grammar rules in the 1-million word Treebank corpus

51LING 180 Autumn 2007

Probabilistic GrammarAssumptions

Wersquore assuming that there is a grammar to be used to parsewithWersquore assuming the existence of a large robust dictionarywith parts of speechWersquore assuming the ability to parse (ie a parser)Given all thathellip we can parse probabilistically

52LING 180 Autumn 2007

Typical Approach

Bottom-up (CKY) dynamic programming approachAssign probabilities to constituents as they arecompleted and placed in the tableUse the max probability for each constituent going up

53LING 180 Autumn 2007

Whatrsquos that last bullet mean

Say wersquore talking about a final part of a parseS-gt0NPiVPj

The probability of the S ishellipP(S-gtNP VP)P(NP)P(VP)

The green stuff is already known Wersquore doing bottom-up parsing

54LING 180 Autumn 2007

Max

I said the P(NP) is knownWhat if there are multiple NPs for the span of text inquestion (0 to i)Take the max (where)

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 10: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

10

55LING 180 Autumn 2007

CKY Where does the max go

56LING 180 Autumn 2007

Problems with PCFGs

The probability model wersquore using is just based on therules in the derivationhellip

Doesnrsquot use the words in any real wayDoesnrsquot take into account where in the derivation arule is used

57LING 180 Autumn 2007

Solution

Add lexical dependencies to the schemehellipInfiltrate the predilections of particular words into theprobabilities in the derivationIe Condition the rule probabilities on the actualwords

58LING 180 Autumn 2007

Heads

To do that wersquore going to make use of the notion ofthe head of a phrase

The head of an NP is its nounThe head of a VP is its verbThe head of a PP is its preposition

(Itrsquos really more complicated than that but this will do)

59LING 180 Autumn 2007

Example (right)

Attribute grammar

60LING 180 Autumn 2007

Example (wrong)

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 11: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

11

61LING 180 Autumn 2007

How

We used to haveVP -gt V NP PP P(rule|VP)

ndash Thatrsquos the count of this rule divided by the number ofVPs in a treebank

Now we haveVP(dumped)-gt V(dumped) NP(sacks)PP(in)P(r|VP ^ dumped is the verb ^ sacks is the head ofthe NP ^ in is the head of the PP)Not likely to have significant counts in any treebank

62LING 180 Autumn 2007

Declare Independence

When stuck exploit independence and collect thestatistics you canhellipWersquoll focus on capturing two things

Verb subcategorizationndash Particular verbs have affinities for particular VPs

Objects affinities for their predicates (mostly theirmothers and grandmothers)

ndash Some objects fit better with some predicates thanothers

63LING 180 Autumn 2007

Subcategorization

Condition particular VP rules on their headhellip so r VP -gt V NP PP P(r|VP)Becomes

P(r | VP ^ dumped)

Whatrsquos the countHow many times was this rule used with (head) dump

divided by the number of VPs that dump appears (ashead) in total

64LING 180 Autumn 2007

Preferences

Subcat captures the affinity between VP heads(verbs) and the VP rules they go withWhat about the affinity between VP heads and theheads of the other daughters of the VPBack to our exampleshellip

65LING 180 Autumn 2007

Example (right)

66LING 180 Autumn 2007

Example (wrong)

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 12: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

12

67LING 180 Autumn 2007

Preferences

The issue here is the attachment of the PP So theaffinities we care about are the ones betweendumped and into vs sacks and intoSo count the places where dumped is the head of aconstituent that has a PP daughter with into as itshead and normalizeVs the situation where sacks is a constituent withinto as the head of a PP daughter

68LING 180 Autumn 2007

Preferences (2)

Consider the VPsAte spaghetti with gustoAte spaghetti with marinara

The affinity of gusto for eat is much larger than itsaffinity for spaghettiOn the other hand the affinity of marinara forspaghetti is much higher than its affinity for ate

69LING 180 Autumn 2007

Preferences (2)

Note the relationship here is more distant anddoesnrsquot involve a headword since gusto andmarinara arenrsquot the heads of the PPs

Vp (ate) Vp(ate)

Vp(ate) Pp(with)Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

70LING 180 Autumn 2007

Summary

Context-Free GrammarsParsing

Top Down Bottom Up MetaphorsDynamic Programming Parsers CKY Earley

DisambiguationPCFGProbabilistic Augmentations to ParsersTreebanks

71LING 180 Autumn 2007

Optional section the Earley alg

72LING 180 Autumn 2007

Problem (minor)

We said CKY requires the grammar to be binary (ieIn Chomsky-Normal Form)We showed that any arbitrary CFG can be convertedto Chomsky-Normal Form so thatrsquos not a huge dealExcept when you change the grammar the treescome out wrongAll things being equal wersquod prefer to leave thegrammar alone

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 13: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

13

73LING 180 Autumn 2007

Earley Parsing

Allows arbitrary CFGsWhere CKY is bottom-up Earley is top-down

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

ndash Completed constituents and their locationsndash In-progress constituentsndash Predicted constituents

74LING 180 Autumn 2007

States

The table-entries are called states and are representedwith dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

75LING 180 Autumn 2007

StatesLocations

It would be nice to know where these things are in the inputsohellip

S -gt VP [00] A VP is predicted at the start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

76LING 180 Autumn 2007

Graphically

77LING 180 Autumn 2007

Earley

As with most dynamic programming approaches theanswer is found by looking in the table in the rightplaceIn this case there should be an S state in the finalcolumn that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

78LING 180 Autumn 2007

Earley Algorithm

March through chart left-to-rightAt each step apply 1 of 3 operators

Predictorndash Create new states representing top-down expectations

Scannerndash Match word predictions (rule with word after dot) to

words

Completerndash When a state is complete see what rules were looking

for that completed constituent

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 14: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

14

79LING 180 Autumn 2007

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated statebeginning and ending where generating state endsSo predictor looking at

ndash S -gt VP [00]

results inndash VP -gt Verb [00]ndash VP -gt Verb NP [00]

80LING 180 Autumn 2007

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechCreate a new state with dot moved over the non-terminalSo scanner looking at

ndash VP -gt Verb NP [00]If the next word ldquobookrdquo can be a verb add new state

ndash VP -gt Verb NP [01]Add this state to chart entry following current oneNote Earley algorithm uses top-down input to disambiguate POSOnly POS predicted by some state can get added to chart

81LING 180 Autumn 2007

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category

copy state move dot insert in current chart entryGiven

NP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

82LING 180 Autumn 2007

Earley how do we know we aredone

How do we know when we are doneFind an S state in the final column that spans from 0to n+1 and is completeIf thatrsquos the case yoursquore done

S ndashgt α [0n+1]

83LING 180 Autumn 2007

Earley

So sweep through the table from 0 to n+1hellipNew predicted states are created by starting top-down from SNew incomplete states are created by advancingexisting states as new constituents are discoveredNew complete states are created in the same way

84LING 180 Autumn 2007

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 15: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

15

85LING 180 Autumn 2007

Example

Book that flightWe should findhellip an S from 0 to 3 that is a completedstatehellip

86LING 180 Autumn 2007

Example

87LING 180 Autumn 2007

Example

88LING 180 Autumn 2007

Example

89LING 180 Autumn 2007

Details

What kind of algorithms did we just describe (bothEarley and CKY)

Not parsers ndash recognizersndash The presence of an S state with the right attributes in

the right place indicates a successful recognitionndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

90LING 180 Autumn 2007

Back to Ambiguity

Did we solve it

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently

Page 16: Dynamic Programming The CKY (Cocke-Kasami-Younger) Algorithmstaff.icar.cnr.it/ruffolo/progetti/projects/09... · A sig n rob al t ec u hy com p led ani h b Use the max probability

16

91LING 180 Autumn 2007

Ambiguity

92LING 180 Autumn 2007

Ambiguity

NohellipBoth CKY and Earley will result in multiple Sstructures for the [0n] table entryThey both efficiently store the sub-parts that areshared between multiple parsesBut neither can tell us which one is rightNot a parser ndash a recognizer

ndash The presence of an S state with the right attributes inthe right place indicates a successful recognition

ndash But no parse treehellip no parserndash Thatrsquos how we solve (not) an exponential problem in

polynomial time

93LING 180 Autumn 2007

Converting Earley fromRecognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we camefrom

94LING 180 Autumn 2007

Augmenting the chart withstructural information

S8S9

S10

S11

S13S12

S8

S9

S8

95LING 180 Autumn 2007

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S inthe last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be anexponential number of treesSo we can at least represent ambiguity efficiently