21
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006

Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006

Embed Size (px)

Citation preview

Chapter 12Lexicalized and

Probabilistic Parsing

Guoqiang Shan

University of Arizona

November 30, 2006

Outline Probabilistic Context-Free Grammars Probabilistic CYK Parsing PCFG Problems

Probabilistic Context-Free Grammars Intuition Behind

To find “correct” parse for the ambiguous sentences i.e. can you book TWA flights? i.e. the flights include a book

Definition of Context-Free Grammar 4-tuple G = (N, Σ, P, S)

N: a finite set of non-terminal symbols Σ: a finite set of terminal symbols, where N Λ Σ = Φ P: A β , where A is in N, and β is in (N V Σ)* S: start symbol in N

Definition of Probabilistic Context-Free Grammar 5-tuple G = (N, Σ, P, S, D)

D: A function P [0,1] to assign a probability to each rule in P

Rules are written as A β[p], where p = D(A β) i.e. A a B [0.6], B C D [0.3]

PCFG ExampleS NP VP .8S Aux NP VP .15S VP .05NP Det Nom .2NP ProperN .35NP Noun .05NP ProNoun .4Nom Noun .75Nom Noun Nom .2Nom ProperN Nom.05VP Verb .55VP Verb NP .4VP Verb NP NP .05

Det that .5Det the .8 Det a .15Noun book .1Noun flights .5Noun meal .4Verb book .3Verb include .3Verb want .4Aux can .4Aux does .3Aux do .3ProperN TWA .4ProperN Denver .6Pronoun you .4Pronoun I .6

Probability of A Sentence in PCFG Probability of any parse tree T of S

P(T,S) = Π D(r(n)) T is the parse tree and S is the sentence to be parsed n is a sub tree of T and r(n) is a rule to expand n

Probability of A parse tree P(T,S) = P(T) * P(S|T) A parse tree T uniquely corresponds a sentence S, so P(S|T) = 1 P(T) = P(T,S)

Probability of a sentence P(S) = Σ P(T), where T is in τ(S), the set of all the parse trees of S In particular, for an unambiguous sentence, P(S) = P(T)

Example

P(Tl) = 0.15*0.40*0.05* 0.05*0.35*0.75* 0.40*0.40*0.30* 0.40*0.50= 3.78*10-7

P(Tr) = 0.15*0.40*0.40* 0.05*0.05*0.75* 0.40*0.40*0.30* 0.40*0.50= 4.32*10-7

Probabilistic CYK Parsing of PCFG Bottom-Up approach

Dynamic Programming: fill the tables of partial solutions to the sub-problems until they contain all the solutions to the entire problem

Input CNF: ε-free, each production in form A β or A BC n words, w1, w2, …, wn

Data Structure Π[i, j, A]: the maximum probability for a constituent with

non-terminal A spanning j words from wi

β[i, j, A] = {k, B, C}, where A BC, and B spans k words from wi (for rebuilding the parse tree)

Output The maximum probability parse will be Π[1,n,1] The root of the parse tree is S, and spans entire string

Base case Consider the input

strings of length one By the rules A wi

Recursive case For strings of words of

length>1, A → wij

There exists some rules A BC and k

0<k<j B → wik (known) C → w(i+k)(j-k) (known)

Compute the probability of wij by multiplying the two probabilities

If there are more than one A BC, pick the one that maximize the probability of wij

CYK Algorithm

Π [i,0,A]

{k, B, C}

My implementation is in lectura under directory /home/shan/538share/pcyk.c

PCFG Example – Revisit to rewrite

S NP VP .8S Aux NP VP .15S VP .05NP Det Nom .2NP ProperN .35NP Noun .05NP ProNoun .4Nom Noun .75Nom Noun Nom .2Nom ProperN Nom.05VP Verb .55VP Verb NP .4VP Verb NP NP .05

Det that .5Det the .8 Det a .15Noun book .1Noun flights .5Noun meal .4Verb book .3Verb include .3Verb want .4Aux can .4Aux does .3Aux do .3ProperN TWA .4ProperN Denver .6Pronoun you .4Pronoun I .6

Example (CYK Parsing) - Rewrite as CNF

S NP VP .8(S Aux NP VP .15)S Aux NV .15NV NP VP 1.0(S VP .05)

S book.00825

S include .00825S want .011S Verb NP .02S Verb DNP .0025NP Det Nom .2(NP ProperN .35)

NP TWA .14NP Denver .21(NP Nom .05)

NP book .00375NP flights .01875NP meal .015NP Noun Nom .01NP ProperN Nom .0025

(NP ProNoun .4)

NP you .16NP I .24(Nom Noun .75)

Nom book .075Nom flights .375Nom meal .3Nom Noun Nom .2Nom ProperN Nom .05(VP Verb .55)

VP book .165VP include .165VP want .22VP Verb NP .4

(VP Verb NP NP .05)VP Verb DNP .05DNP NP NP 1.0

Example (CYK Parsing) – Π matrix Π i+j i

1 2 3 4 5

1 Aux: .4

2Pronoun: .4

NP: .16

3

Noun: .1Verb: .3VP: .165

Nom: .075NP: .00375S: .00825

4ProperN: .4

NP: .14

5Noun: .5

Nom: .375NP: .01875

can you book TWA flights

Example (CYK Parsing) – Π matrix Π i+j i

1 2 3 4 5

1 Aux: .4 0

2Pronoun: .4

NP: .16

S: .02112NV: .0264

DNP: .0006

3

Noun: .1Verb: .3VP: .165

Nom: .075NP: .00375S: .00825

S: .00084VP: .0168

DNP: 000525

4ProperN: .4

NP: .14

NP: .000375Nom: .0075

DNP: .002625

5Noun: .5

Nom: .375NP: .01875

can you book TWA flights

Example (CYK Parsing) – Π matrix Π i+j i

1 2 3 4 5

1 Aux: .4 0 S: .01584

2Pronoun: .4

NP: .16

S: .02112NV: .0264

DNP: .0006

S: .021504NV: .002688

3

Noun: .1Verb: .3VP: .165

Nom: .075NP: .00375S: .00825

S: .00084VP: .0168

DNP: 000525

S: .00000225NP: .0000075Nom: .00015VP: .000045

DNP: .000001406

4ProperN: .4

NP: .14

NP: .000375Nom: .0075

DNP: .002625

5Noun: .5

Nom: .375NP: .01875

can you book TWA flights

Example (CYK Parsing) – Π matrix Π i+j i

1 2 3 4 5

1 Aux: .4 0 S: .01584 S: .00016128

2Pronoun: .4

NP: .16

S: .02112NV: .0264

DNP: .0006

S: .021504NV: .002688

S: .00000576NV: .0000072

DNP: .0000012

3

Noun: .1Verb: .3VP: .165

Nom: .075NP: .00375S: .00825

S: .00084VP: .0168

DNP: 000525

S: .00000225NP: .0000075Nom: .00015VP: .000045

DNP: .000001406

4ProperN: .4

NP: .14

NP: .000375Nom: .0075

DNP: .002625

5Noun: .5

Nom: .375NP: .01875

can you book TWA flights

Example (CYK Parsing) – Π matrix Π i+j i

1 2 3 4 5

1 Aux: .4 0 S: .01584 S: .00016128 S: .000000432

2Pronoun: .4

NP: .16

S: .02112NV: .0264

DNP: .0006

S: .021504NV: .002688

S: .00000576NV: .0000072

DNP: .0000012

3

Noun: .1Verb: .3VP: .165

Nom: .075NP: .00375S: .00825

S: .00084VP: .0168

DNP: 000525

S: .00000225NP: .0000075Nom: .00015VP: .000045

DNP: .000001406

4ProperN: .4

NP: .14

NP: .000375Nom: .0075

DNP: .002625

5Noun: .5

Nom: .375NP: .01875

can you book TWA flights

Example (CYK Parsing) – β matrix

B i+j i

1 2 3 4 5

1 N/A S Aux NV, k = 1 S Aux NV, k = 1 S Aux NV, k = 1

2S NP VP, k = 1

NV NP VP, k = 1DNP NP NP, k = 1

S NP VP, k = 1NV NP VP, k = 1

S NP VP, k = 1NV NP VP, k = 1

DNP NP NP, k = 1

3S Verb NP, k = 1

VP Verb NP, k = 1DNP NP NP, k = 1

S Verb NP, k = 1NP Noun Nom, k = 1

Nom Noun Nom, k = 1VP Verb NP, k = 1DNP NP NP, k = 1

4NP ProperN Nom, k = 1

Nom ProperN Nom, k = 1DNP NP NP, k = 1

5

can you book TWA flights

PCFG Problems Independence Assumption

Assumption: the expansion of one nonterminal is independent of the expansion of others.

However, examination shows that how a node expands is dependent on the location of the node

91% of the subjects are pronouns. She’s able to take her baby to work with her. (91%) Uh, my wife worked until we had a family. (9%)

But only 34% of the objects are pronouns. Some laws absolutely prohibit it. (34%) All the people signed confessions. (66%)

PCFG Problems Lack of sensitivity of words

Lexical information in a PCFG can only be represented via the probability of pre-terminal nodes (such as Verb, Noun, Det)

However, lexical information and dependencies turns out to be important in modeling syntactic probabilities.

Example: Moscow sent more than 100,000 soldiers into Afghanistan.

In PCFG, into Afghanistan may attach NP (more than 100,000 soldiers) or VP (sent)

Statistics shows that NP attachment is 67% or 52% Thus, PCFG will produce an incorrect result. Why? the word “Send” subcategorizes for a destination, which

can be expressed with the preposition “into”. In fact, when the verb is “send”, “into” always attaches to it

PCFG Problems Coordination

ambiguity Look at the following

case Example: dogs in

houses and cats Semantically, dogs is

a better conjunct for cats than houses

Thus, the parse [dogs in [NP houses and cats]] intuitively sounds unnatural, and should be dispreferred.

However, PCFG assigns them the same probability, since the structures are using exactly the same rules.

References NLTK Tutorial: Probabilistic Parsing: http://

nltk.sourceforge.net/tutorial/pcfg/index.html Stanford Probabilistic Parsing Group:

http://nlp.stanford.edu/projects/stat-parsing.shtml General CYK algorithm http://en.wikipedia.org/wiki/CYK_algorithm General CYK algorithm web compute http://www2.informatik.hu-berlin.de/~pohl/cyk.php?action=example Probabilistic CYK parsing http://www.ifi.unizh.ch/cl/gschneid/ParserVorl/ParserVorl7.pdf http://catarina.ai.uiuc.edu/ling306/slides/lecture23.pdf

Questions?