24
SI485i : NLP Set 8 PCFGs and the CKY Algorithm

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

Embed Size (px)

Citation preview

Page 1: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

SI485i : NLP

Set 8

PCFGs and the CKY Algorithm

Page 2: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

2

PCFGs

• We saw how CFGs can model English (sort of)• Probabilistic CFGs put weights on the production

rules

• NP -> DET NN with probability 0.34• NP -> NN NN with probability 0.16

Page 3: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

3

PCFGs

• We still parse sentences and come up with a syntactic derivation tree

• But now we can talk about how confident the tree is• P(tree) !

Page 4: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

4

Buffalo Example

• What is the probability of this tree?• It’s the product of all the inner trees, e.g., P(S->NP VP)

Page 5: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

5

PCFG Formalized

• G = (T, N, S, R, P)• T is set of terminals• N is set of nonterminals• For NLP, we usually distinguish out a set P N of preterminals, ⊂

which always rewrite as terminals

• S is the start symbol (one of the nonterminals)• R is rules/productions of the form X → γ, where X is a

nonterminal and γ is a sequence of terminals and nonterminals• P(R) gives the probability of each rule.

• A grammar G generates a language model L.

Some slides adapted from Chris Manning

Page 6: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

6

Some notation

• w1n = w1 … wn = the word sequence from 1 to n• wab = the subsequence wa … wb

• We’ll write P(Ni → ζj) to mean P(Ni → ζj | Ni )• Take note, this is a conditional probability. For instance, the

sum of all rules headed by an NP must sum to 1!

• We’ll want to calculate the best tree T• maxT P(T * ⇒ wab)

Page 7: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

7

Trees and Probabilities

• P(t) -- The probability of tree is the product of the probabilities of the rules used to generate it.

• P(w1n) -- The probability of the string is the sum of the probabilities of all possible trees that have the string as their yield• P(w1n) = Σj P(w1n, tj) where tj is a parse of w1n• = Σj P(tj)

Page 8: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

8

Example PCFG

Page 9: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

9

Page 10: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

10

Page 11: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

11

P(tree) computation

Page 12: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

12

Time to Parse

• Let’s parse!!• Almost ready…• Trees must be in Chomsky Normal Form first.

Page 13: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

13

Chomsky Normal Form

• All rules are Z -> X Y or Z -> w• Transforming a grammar to CNF does not change its

weak generative capacity.• Remove all unary rules and empties• Transform n-ary rules: VP->V NP PP becomes • VP -> V @VP-V and @VP-V -> NP PP

• Why do we do this? Parsing is easier now.

Page 14: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

14

Converting into CNF

Page 15: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

15

The CKY Algorithm

• Cocke-Kasami-Younger (CKY)

DynamicProgrammingIs back!

Page 16: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

16

The CKY AlgorithmNP->NN NNS 0.13p = 0.13 x .0023 x .0014p = 1.87 x 10^-7

NP->NNP NNS 0.056p = 0.056 x .001 x .0014p = 7.84 x 10^-8

Page 17: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

17

The CKY Algorithm

• What is the runtime? O( ?? )• Note that each cell must

check all pairs of children below it.

• Binarizing the CFG rules is a must. The complexity explodes if you do not.

Page 18: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

18

Page 19: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

19

Page 20: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

20

Page 21: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

21

Page 22: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

22

Page 23: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

23

Evaluating CKY

• How do we know if our parser works?

• Count the number of correct labels in your table...the label and the span it dominates• [ label, start, finish ]

• Most trees have an error or two!

• Count how many spans are correct, wrong, and compute a Precision/Recall ratio.

Page 24: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules

24

Probabilities?

• Where do the probabilities come from?• P( NP -> DT NN ) = ???

• Penn Treebank: a bunch of newspaper articles whose sentences have been manually annotated with full parse trees

• P( NP -> DT NN ) = C( NP -> DT NN ) / C( NP )