35
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 ICS 482 Natural Language Natural Language Processing Processing Probabilistic Context Probabilistic Context Free Grammars (Chapter Free Grammars (Chapter 14) 14) Muhammed Al-Mulhem Muhammed Al-Mulhem March 1, 2009 March 1, 2009

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

Embed Size (px)

Citation preview

Page 1: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 Dr. Muhammed Al-Mulhem 1

ICS 482ICS 482Natural Language Natural Language

ProcessingProcessing

Probabilistic Context Probabilistic Context Free Grammars Free Grammars

(Chapter 14)(Chapter 14)Muhammed Al-MulhemMuhammed Al-Mulhem

March 1, 2009March 1, 2009

Page 2: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 2Dr. Muhammed Al-Mulhem

SCFGSCFG Probabilistic CFG (PCFG) or Stochastic CFG Probabilistic CFG (PCFG) or Stochastic CFG

(SCFG) is the simplest augmentation of the (SCFG) is the simplest augmentation of the CFG.CFG.

A CFG G is defined as (N, Σ, R, S), a SCFG is A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where also defined as (N, Σ, R, S), where N is a set of non-terminal symbols.N is a set of non-terminal symbols. Σ a set of terminal symbols, (N Σ a set of terminal symbols, (N ∑= Ø).∑= Ø). R is a set of production rules, each of the form A R is a set of production rules, each of the form A

→ → ββ[p], where[p], where A A N N ββ (Σ (ΣN)*N)* P is a number between 0 and 1 expressing P(A → P is a number between 0 and 1 expressing P(A → ββ))

S is the start symbol, S S is the start symbol, S N. N.

Page 3: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 3Dr. Muhammed Al-Mulhem

SCFGSCFG

P(A → P(A → ββ) expresses the probability that ) expresses the probability that the A will be expanded to the A will be expanded to ββ..

If we consider all the possible expansions of a If we consider all the possible expansions of a non-terminal A, the sum of their probabilities non-terminal A, the sum of their probabilities must be 1.must be 1.

1)(AP

Page 4: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 4Dr. Muhammed Al-Mulhem

Example 1Example 1 Attach probabilities to grammar Attach probabilities to grammar

rulesrules The sum of the probabilities of a The sum of the probabilities of a

given non-terminal, such as VP, is 1given non-terminal, such as VP, is 1

VP VP Verb Verb .55.55

VP VP Verb NP Verb NP .40.40

VP VP Verb NP NP Verb NP NP .05.05

Page 5: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 5Dr. Muhammed Al-Mulhem

Example2Example2

NP Det N : 0.4NP NPposs N : 0.1NP Pronoun : 0.2NP NP PP : 0.1NP N : 0.2

NP

Det

NP

N

PP

P(subtree above) = 0.1 x 0.4 = 0.04

Page 6: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 6Dr. Muhammed Al-Mulhem

Example 3Example 3

Page 7: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 7Dr. Muhammed Al-Mulhem

Example 3Example 3

Page 8: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 8Dr. Muhammed Al-Mulhem

These Are the rules that generate the above trees (not the grammar)

Page 9: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 9Dr. Muhammed Al-Mulhem

Example 3Example 3

The probabilities for the two parse trees are calculated by multiplying the probabilities of the production rules used to generate the parse tree.

P(T1)= .15x.40x.05x.05x.35x.75x.40x.40x.30x.40x.50 = 3.78x10-7

P(T2)= .15x.40x.40x.05x.05x.75x.40x.40x.30x.40x.50 = 4.32x10-7

Page 10: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 10Dr. Muhammed Al-Mulhem

Example 4Example 4

S S → → NP VP 1.0 NP VP 1.0 PP PP → → P NP 1.0P NP 1.0 VP VP → → V NP 0.7V NP 0.7 VP VP → → VP PP 0.3VP PP 0.3 P P → → with 1.0with 1.0 V V → → saw 1.0saw 1.0

NP → NP PP 0.4 NP → astronomers 0.1 NP → ears 0.18 NP → saw 0.04 NP → stars 0.18 NP → telescopes 0.1

Page 11: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 11Dr. Muhammed Al-Mulhem

Example 4: Astronomers saw stars with ears

Page 12: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 12Dr. Muhammed Al-Mulhem

The probabilities of the two parse The probabilities of the two parse treestrees

P(t1) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072

P(t2) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804

Page 13: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 13Dr. Muhammed Al-Mulhem

Example 4Example 4

S → NP VP 1.0

PP → P NP 1.0

VP → V NP 0.7

VP → VP PP 0.3

P → with 1.0

V → saw 1.0

NP → NP PP 0.4

NP → astronomers 0.1

NP → ears 0.18

NP → saw 0.04

NP → stars 0.18

NP → telescopes 0.1

P(t1) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072 P(t2) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804

Page 14: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 14Dr. Muhammed Al-Mulhem

Probabilistic CFGsProbabilistic CFGs The probabilistic modelThe probabilistic model

Assigning probabilities to parse treesAssigning probabilities to parse trees Getting the probabilities for the Getting the probabilities for the

modelmodel Parsing with probabilitiesParsing with probabilities

Slight modification to dynamic Slight modification to dynamic programming approachprogramming approach

Task is to find the max probability tree Task is to find the max probability tree for an inputfor an input

Page 15: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 15Dr. Muhammed Al-Mulhem

Getting the ProbabilitiesGetting the Probabilities From an annotated database (a From an annotated database (a

treebank)treebank) Learned from a corpusLearned from a corpus

Page 16: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 16Dr. Muhammed Al-Mulhem

TreebankTreebank Get a large collection of parsed Get a large collection of parsed

sentences.sentences. Collect counts for each non-terminal Collect counts for each non-terminal

rule expansion in the collection.rule expansion in the collection. NormalizeNormalize DoneDone

Page 17: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 17Dr. Muhammed Al-Mulhem

LearningLearning What if you don’t have a treebank (and What if you don’t have a treebank (and

can’t get one).can’t get one). Take a large collection of text and parse it.Take a large collection of text and parse it. In the case of syntactically ambiguous In the case of syntactically ambiguous

sentences collect all the possible parses.sentences collect all the possible parses. Prorate the rule statistics gathered for Prorate the rule statistics gathered for

rules in the ambiguous case by their rules in the ambiguous case by their probability.probability.

Proceed as you did with a treebank.Proceed as you did with a treebank. Inside-OutsideInside-Outside algorithm. algorithm.

Page 18: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 18Dr. Muhammed Al-Mulhem

AssumptionsAssumptions

We’re assuming that there is a grammar We’re assuming that there is a grammar to be used to parse with.to be used to parse with.

We’re assuming the existence of a large We’re assuming the existence of a large robust dictionary with parts of speech.robust dictionary with parts of speech.

We’re assuming the ability to parse (i.e. We’re assuming the ability to parse (i.e. a parser).a parser).

Given all that… we can parse Given all that… we can parse probabilistically. probabilistically.

Page 19: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 19Dr. Muhammed Al-Mulhem

Typical ApproachTypical Approach

Bottom-up dynamic programming Bottom-up dynamic programming approachapproach

Assign probabilities to constituents as Assign probabilities to constituents as they are completed and placed in the they are completed and placed in the tabletable

Use the max probability for each Use the max probability for each constituent going up.constituent going up.

Page 20: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 20Dr. Muhammed Al-Mulhem

Max probabilityMax probability Say we’re talking about a final part Say we’re talking about a final part

of a parseof a parse SS0 0 NP NPiiVPVPjj

The probability of the S is…The probability of the S is…

P(S P(S NP VP)* NP VP)*P(NP)*P(VP)P(NP)*P(VP)

The yellow part is already known. The yellow part is already known. We’re doing bottom-up parsingWe’re doing bottom-up parsing

Page 21: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 21Dr. Muhammed Al-Mulhem

MaxMax The P(NP) is known.The P(NP) is known. What if there are multiple NPs for What if there are multiple NPs for

the span of text in question (the span of text in question (00 to to ii)?)? Take the max (Why?)Take the max (Why?) Does not mean that other kinds of Does not mean that other kinds of

constituents for the same span are constituents for the same span are ignored (i.e. they might be in the ignored (i.e. they might be in the solution)solution)

Page 22: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 22Dr. Muhammed Al-Mulhem

Probabilistic ParsingProbabilistic Parsing Probabilistic CYK (Cocke-Younger-Probabilistic CYK (Cocke-Younger-

Kasami) algorithm for parsing PCFG Kasami) algorithm for parsing PCFG Bottom-up dynamic programming Bottom-up dynamic programming

algorithm algorithm Assume PCFG is in Chomsky Assume PCFG is in Chomsky

Normal Form (production is either Normal Form (production is either A → B C or A → A → B C or A → aa) )

Page 23: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 23Dr. Muhammed Al-Mulhem

Chomsky Normal Form Chomsky Normal Form (CNF)(CNF)

All rules have form:

BCA

Non-Terminal Non-Terminal

aA and

terminal

Page 24: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 24Dr. Muhammed Al-Mulhem

April 18, 2023 24

Examples:

bA

SAA

aS

ASS

Not ChomskyNormal Form

aaA

SAA

AASS

ASS

Chomsky Normal Form

Page 25: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 25Dr. Muhammed Al-Mulhem

April 18, 2023 25

Observations

Chomsky normal forms are good for parsing and proving theorems

It is possible to find the Chomsky normal form of any context-free grammar

Page 26: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 26Dr. Muhammed Al-Mulhem

April 18, 2023 26

Probabilistic CYK Parsing Probabilistic CYK Parsing of PCFGsof PCFGs

CYK Algorithm: bottom-up parser CYK Algorithm: bottom-up parser Input: Input:

A Chomsky normal form PCFG, G= (N, Σ, P, S, D) A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 2, …, |N|, and the start symbol S has index 1

n n words words ww11,…, ,…, wwnn Data Structure: Data Structure:

A dynamic programming array πA dynamic programming array π[i,j,a] [i,j,a] holds the holds the maximum probability for a constituent with non-maximum probability for a constituent with non-terminal index terminal index a a spanning words spanning words i..ji..j. .

Output: Output: The maximum probability parse πThe maximum probability parse π[1,n,1] [1,n,1]

Page 27: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 27Dr. Muhammed Al-Mulhem

April 18, 2023 27

Base CaseBase Case

CYK fills out πCYK fills out π[i,j,a] [i,j,a] by induction by induction Base case Base case

Input strings with length = 1 Input strings with length = 1 (individual words (individual words wwii) )

In CNF, the probability of a given non-In CNF, the probability of a given non-terminal A expanding to a single word terminal A expanding to a single word wwii must come only from the rule A → must come only from the rule A → wwi i

ii.e., P(A → .e., P(A → wwii) )

Page 28: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 28Dr. Muhammed Al-Mulhem

April 18, 2023 28

Probabilistic CYK Probabilistic CYK Algorithm Algorithm [[CorrectedCorrected]]

Function Function CYK(CYK(wordswords, , grammargrammar) ) return return the most probable parse and its probability the most probable parse and its probability

For For i ←1 i ←1 to to num_wordsnum_words for for aa ←1 ←1 to to num_nonterminalsnum_nonterminals If If ((A →wA →wii) is in grammar ) is in grammar then then π[i, i, a] ←P(A →wπ[i, i, a] ←P(A →wii))

For For spanspan ←2 ←2 to to num_wordsnum_words For For beginbegin ←1 ←1 to to num_wordsnum_words – – spanspan + 1 + 1 endend ← ←beginbegin + + spanspan – 1 – 1 For For mm ← ←beginbegin to to endend – 1 – 1 For For aa ←1 ←1 to to num_nonterminals num_nonterminals For For bb ←1 ←1 to to num_nonterminalsnum_nonterminals For For cc ←1 ←1 to to num_nonterminalsnum_nonterminals prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC)prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) If If ((prob > π[begin, end, aprob > π[begin, end, a]) ]) then then π[begin, end, a] = prob π[begin, end, a] = prob back[begin, end, a] = {m, b, c}back[begin, end, a] = {m, b, c}

Return Return build_tree(back[1, num_words, 1]), π[1, num_words, 1]build_tree(back[1, num_words, 1]), π[1, num_words, 1]

Page 29: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 29Dr. Muhammed Al-Mulhem

April 18, 2023 29

The CYK Membership Algorithm

Input:

• Grammar in Chomsky Normal Form G

• String

Output:

find if )(GLw

w

Page 30: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 30Dr. Muhammed Al-Mulhem

April 18, 2023 30

The Algorithm

• Grammar :G

bB

ABB

aA

BBA

ABS

• String : w aabbb

Input example:

Page 31: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 31Dr. Muhammed Al-Mulhem

April 18, 2023 31

a

a b b b

aa ab bb bb

aab abb bbb

aabb abbb

aabbb

aabbb

All substrings of length 1

All substrings of length 2

All substrings of length 3

All substrings of length 4

All substrings of length 5

Page 32: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 32Dr. Muhammed Al-Mulhem

April 18, 2023 32

aA

aA

bB

bB

bB

aa ab bb bb

aab abb bbb

aabb abbb

aabbb

bB

ABB

aA

BBA

ABS

Page 33: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 33Dr. Muhammed Al-Mulhem

April 18, 2023 33

aA

aA

bB

bB

bB

aa abS,B

bbA

bbA

aab abb bbb

aabb abbb

aabbb

bB

ABB

aA

BBA

ABS

Page 34: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 34Dr. Muhammed Al-Mulhem

April 18, 2023 34

aA

aA

bB

bB

bB

aa abS,B

bbA

bbA

aabS,B

abbA

bbbS,B

aabbA

abbbS,B

aabbbS,B

bB

ABB

aA

BBA

ABS

Therefore: )(GLaabbb

Page 35: March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

March 1, 2009 35Dr. Muhammed Al-Mulhem

April 18, 2023 35

CYK Algorithm for CYK Algorithm for Deciding Context Free Deciding Context Free

LanguagesLanguages

IDEA: For each substring of a given IDEA: For each substring of a given input input xx, find all variables which can , find all variables which can derive the substring. Once these have derive the substring. Once these have been found, telling which variables been found, telling which variables generate generate x x becomes a simple matter becomes a simple matter of looking at the grammar, since it’s of looking at the grammar, since it’s in Chomsky normal formin Chomsky normal form