Statistical Parsing and CKY Algorithm · Dynamic Programming Parsing • CKY (Cocke-Kasami-Younger)...

Preview:

Citation preview

Statistical Parsing and CKY Algorithm

Many slides from Ray Mooney and Michael Collins

Instructor: Wei Xu Ohio State University

TA Office Hours for HW#2• Dreese 390: - 03/28 Tue 10:00AM-12:00 noon - 03/30 Thu 10:00AM-12:00 noon - 04/04 Tue 10:00AM-12:00 noon

• Readings: - textbook http://ciml.info/dl/v0_99/ciml-v0_99-ch17.pdf - slide #28,29.30: https://cocoxu.github.io/courses/

5525_slides_spring17/15_more_memm.pdf

Wuwei Lan

Syntactic Parsing

Syntax

Parsing• Given a string of terminals and a CFG, determine if the string

can be generated by the CFG: - also return a parse tree for the string - also return all possible parse trees for the string

• Must search space of derivations for one that derives the given string. - Top-Down Parsing - Bottom-Up Parsing

Simple CFG for ATIS English

S → NP VP S → Aux NP VP S → VP NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP

Det → the | a | that | this Noun → book | flight | meal | money Verb → book | include | prefer Pronoun → I | he | she | me Proper-Noun → Houston | NWA Aux → does Prep → from | to | on | near | through

Grammar Lexicon

S

VP

Verb NP

book Det Nominal

that Noun

flight

book that flight

Parsing Example

Top Down ParsingS

NP VP

Pronoun

• Start searching space of derivations for the start symbol.

S

NP VP

Pronoun

bookX

Top Down Parsing

S

NP VP

ProperNoun

Top Down Parsing

S

NP VP

ProperNoun

bookX

Top Down Parsing

S

NP VP

Det Nominal

Top Down Parsing

S

NP VP

Det Nominal

bookX

Top Down Parsing

S

Aux NP VP

Top Down Parsing

S

Aux NP VP

bookX

Top Down Parsing

S

VP

Top Down Parsing

S

VP

Verb

Top Down Parsing

S

VP

Verb

book

Top Down Parsing

S

VP

Verb

bookX

that

Top Down Parsing

S

VP

Verb NP

Top Down Parsing

S

VP

Verb NP

book

Top Down Parsing

S

VP

Verb NP

book Pronoun

Top Down Parsing

S

VP

Verb NP

book Pronoun

Xthat

Top Down Parsing

S

VP

Verb NP

book ProperNoun

Top Down Parsing

S

VP

Verb NP

book ProperNoun

Xthat

Top Down Parsing

S

VP

Verb NP

book Det Nominal

Top Down Parsing

S

VP

Verb NP

book Det Nominal

that

Top Down Parsing

S

VP

Verb NP

book Det Nominal

that Noun

Top Down Parsing

S

VP

Verb NP

book Det Nominal

that Noun

flight

Top Down Parsing

book that flight

• Start searching space of reverse derivations from the terminal symbols in the string.

Bottom Up Parsing

book that flight

Noun

Bottom Up Parsing

book that flight

Noun

Nominal

Bottom Up Parsing

book that flight

Noun

Nominal Noun

Nominal

Bottom Up Parsing

book that flight

Noun

Nominal Noun

Nominal

X

Bottom Up Parsing

book that flight

Noun

Nominal PP

Nominal

Bottom Up Parsing

book that flight

Noun Det

Nominal PP

Nominal

Bottom Up Parsing

book that flight

Noun Det

NP

Nominal

Nominal PP

Nominal

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

Nominal PP

Nominal

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

Nominal PP

Nominal

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

S

VP

Nominal PP

Nominal

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

S

VP

X

Nominal PP

Nominal

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

Nominal PP

Nominal

X

Bottom Up Parsing

book that

Verb Det

NP

Nominal

flight

Noun

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

Bottom Up Parsing

Det

book that

Verb

VP

S

NP

Nominal

flight

Noun

Bottom Up Parsing

Det

book that

Verb

VP

S

XNP

Nominal

flight

Noun

Bottom Up Parsing

book that

Verb

VP

VP

PP

Det

NP

Nominal

flight

Noun

Bottom Up Parsing

book that

Verb

VP

VP

PP

Det

NP

Nominal

flight

Noun

X

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

NP

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

S

Bottom Up Parsing

Top Down vs. Bottom Up• Top down never explores options that will not lead to a full

parse, but can explore many options that never connect to the actual sentence. • Bottom up never explores options that do not connect to the

actual sentence but can explore options that can never lead to a full parse. • Relative amounts of wasted search depend on how much the

grammar branches in each direction.

CYK Algorithm

Syntax

Dynamic Programming Parsing• CKY (Cocke-Kasami-Younger) algorithm based on bottom-up

parsing and requires first normalizing the grammar. • First grammar must be converted to Chomsky normal form

(CNF) in which productions must have either exactly 2 non-terminal symbols on the RHS or 1 terminal symbol (lexicon rules). • Parse bottom-up storing phrases formed from all substrings

in a triangular table (chart).

Dynamic Programming• a general algorithm design technique for solving problems

defined by recurrences with overlapping subproblems • first invented by Richard Bellman in 1950s • “programming” here means “planning” or finding an optimal

program, as also seen in the term “linear programming” • Main idea: • setup a recurrence of smaller subproblems • solve subproblems once and record solutions in a table

(avoid any recalculation)

ATIS English Grammar Conversion

S → NP VP S → Aux NP VP

S → VP

NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP

Original Grammar Chomsky Normal FormS → NP VP S → X1 VP X1 → Aux NP S → book | include | prefer S → Verb NP S → VP PP NP → I | he | she | me NP → Houston | NWA NP → Det Nominal Nominal → book | flight | meal | money Nominal → Nominal Noun Nominal → Nominal PP VP → book | include | prefer VP → Verb NP VP → VP PP PP → Prep NP

Note that, although not shown here, original grammar contain all the lexical entires.

Exercise

CKY Parser Book the flight through Houston

i= 0

1

2

3

4

j= 1 2 3 4 5

Cell[i,j] contains all constituents (non-terminals) covering words i +1 through j

CKY Parser

i= 0

1

2

3

4

Cell[i,j] contains all constituents (non-terminals) covering words i +1 through j

Book the flight through Houstonj= 1 2 3 4 5

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

VP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

SVP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

VPSVP

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

VPSVP

S

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

VPSVP

S Parse Tree #1

Book the flight through Houston

CKY Parser

S, VP, Verb, Nominal, Noun

Det

Nominal, Noun

None

NP

VPS

Prep

None

None

None

NP ProperNoun

PP

Nominal

NP

VPSVP

S Parse Tree #2

Book the flight through Houston

The Problem with Parsing: Ambiguity

INPUT:She announced a program to promote safety in trucks and vans

+POSSIBLE OUTPUTS:

S

NP

She

VP

announced NP

NP

a program

VP

to promote NP

safety PP

in NP

trucks and vans

S

NP

She

VP

announced NP

NP

NP

a program

VP

to promote NP

safety PP

in NP

trucks

and NP

vans

S

NP

She

VP

announced NP

NP

a program

VP

to promote NP

NP

sa fety PP

in NP

trucks

and NP

va ns

S

NP

Sh e

VP

announced NP

NP

a program

VP

to promote NP

safety

PP

in NP

trucks and vans

S

NP

She

VP

announced NP

NP

NP

a program

VP

to promote NP

safe ty

PP

in NP

trucks

and NP

va ns

S

NP

She

VP

announced NP

NP

NP

a program

VP

to promote NP

safe ty

PP

in NP

trucks and va ns

And there are more...

Probabilistic Context Free Grammars (PCFG)

Syntax

split point

O(n3|N|3)

O(n2) for l, i choices

S(saw)

NP(man)

DT(the)

the

NN(man)

man

VP(saw)

VP(saw)

Vt(saw)

saw

NP(dog)

DT(the)

the

NN(dog)

dog

PP(with)

IN(with)

with

NP(telescope)

DT(the)

the

NN(telescope)

telescope

p(t) = q(S(saw) !2 NP(man) VP(saw))⇥q(NP(man) !2 DT(the) NN(man))⇥q(VP(saw) !1 VP(saw) PP(with))⇥q(VP(saw) !1 Vt(saw) NP(dog))⇥q(PP(with) !1 IN(with) NP(telescope))⇥ . . .

Parsing with Lexicalized CFGs

I The new form of grammar looks just like a Chomsky normalform CFG, but with potentially O(|⌃|2 ⇥ |N |3) possible rules.

I Naively, parsing an n word sentence using the dynamicprogramming algorithm will take O(n3|⌃|2|N |3) time. But|⌃| can be huge!!

I Crucial observation: at most O(n2 ⇥ |N |3) rules can beapplicable to a given sentence w1, w2, . . . wn of length n.This is because any rules which contain a lexical item that isnot one of w1 . . . wn, can be safely discarded.

I The result: we can parse in O(n5|N |3) time.

Dependency Parsing

Syntax

Recommended