277
Parsing Algorithms Yoav Goldberg (with slides by Michael Collins, Julia Hockenmaier)

u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

  • Upload
    dobao

  • View
    513

  • Download
    0

Embed Size (px)

Citation preview

Page 1: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing AlgorithmsYoav Goldberg

(with slides by Michael Collins, Julia Hockenmaier)

Page 2: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing: recovering the constituents of a sentence.

Page 3: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Why is parsing hard?Ambiguity

Fat people eat candy

S

NP

Adj

Fat

Nn

people

VP

Vb

eat

NP

Nn

candy

Fat people eat accumulates

S

NP

Nn

Fat

AdjP

Nn

people

Vb

eat

VP

Vb

accumulates

11 / 48

Page 4: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Why is parsing hard?Ambiguity

Fat people eat candy

S

NP

Adj

Fat

Nn

people

VP

Vb

eat

NP

Nn

candy

Fat people eat accumulates

S

NP

Nn

Fat

AdjP

Nn

people

Vb

eat

VP

Vb

accumulates

11 / 48

Page 5: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Why is parsing hard?Ambiguity

Fat people eat candyS

NP

Adj

Fat

Nn

people

VP

Vb

eat

NP

Nn

candy

Fat people eat accumulates

S

NP

Nn

Fat

AdjP

Nn

people

Vb

eat

VP

Vb

accumulates

11 / 48

Page 6: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Why is parsing hard?Ambiguity

Fat people eat candyS

NP

Adj

Fat

Nn

people

VP

Vb

eat

NP

Nn

candy

Fat people eat accumulates

S

NP

Nn

Fat

AdjP

Nn

people

Vb

eat

VP

Vb

accumulates

11 / 48

Page 7: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Why is parsing hard?Ambiguity

Fat people eat candyS

NP

Adj

Fat

Nn

people

VP

Vb

eat

NP

Nn

candy

Fat people eat accumulates

S

NP

Nn

Fat

AdjP

Nn

people

Vb

eat

VP

Vb

accumulates

11 / 48

Page 8: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Why is parsing hard?Real Sentences are long. . .

“Former Beatle Paul McCartney today was ordered to paynearly $50M to his estranged wife as their bitter divorce battlecame to an end . ”

“Welcome to our Columbus hotels guide, where you’ll findhonest, concise hotel reviews, all discounts, a lowest rateguarantee, and no booking fees.”

12 / 48

Page 9: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Let’s learn how to parse

Page 10: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

A context free grammar G = (N,⌃,R,S) where:I N is a set of non-terminal symbolsI ⌃ is a set of terminal symbolsI R is a set of rules of the form X ! Y1Y2 · · ·Yn

for n � 0, X 2 N, Yi 2 (N [ ⌃)

I S 2 N is a special start symbol

14 / 48

Page 11: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarN = {S,NP,VP,Adj ,Det ,Vb,Noun}⌃ = {fruit , flies, like, a, banana, tomato, angry}S =‘S’R =

S ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NPAdj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry

15 / 48

Page 12: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivations

Left-most derivation is a sequence of strings s1, · · · , sn whereI s1 = S the start symbolI sn 2 ⌃⇤, meaning sn is only terminal symbolsI Each si for i = 2 · · · n is derived from si�1 by picking the

left-most non-terminal X in si�1 and replacing it by some �where X ! � is a rule in R.

For example: [S],[NP VP],[Adj Noun VP], [fruit Noun VP], [fruitflies VP],[fruit flies Vb NP],[fruit flies like NP], [fruit flies like DetNoun], [fruit flies like a], [fruit flies like a banana]

16 / 48

Page 13: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

S

NP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 14: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VP

Adj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VP

NP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 15: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VP

fruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VP

NP ! Adj Noun

Adj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 16: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VP

fruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj Noun

Adj ! fruit

Noun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 17: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VP

fruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruit

Noun ! flies

VP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 18: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NP

fruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! flies

VP ! Vb NP

Vb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 19: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NP

fruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NP

Vb ! like

NP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 20: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Noun

fruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! like

NP ! Det Noun

Det ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 21: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Noun

fruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det Noun

Det ! a

Noun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 22: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! a

Noun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 23: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.

I Many trees can be generated.

17 / 48

Page 24: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Left-most derivation example

SNP VPAdj Noun VPfruit Noun VPfruit flies VPfruit flies Vb NPfruit flies like NPfruit flies like Det Nounfruit flies like a Nounfruit flies like a banana

S ! NP VPNP ! Adj NounAdj ! fruitNoun ! fliesVP ! Vb NPVb ! likeNP ! Det NounDet ! aNoun ! banana

I The resulting derivation can be written as a tree.I Many trees can be generated.

17 / 48

Page 25: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

Example

18 / 48

Page 26: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

ExampleS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana

18 / 48

Page 27: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

ExampleS

NP

Adj

Angry

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana

18 / 48

Page 28: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

ExampleS

NP

Adj

Angry

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

tomato

18 / 48

Page 29: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

ExampleS

NP

Adj

Angry

Noun

banana

VP

Vb

like

NP

Det

a

Noun

tomato

18 / 48

Page 30: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

ExampleS

NP

Det

a

Noun

banana

VP

Vb

like

NP

Det

a

Noun

tomato

18 / 48

Page 31: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Context Free Grammars

a simple grammarS ! NP VPNP ! Adj NounNP ! Det NounVP ! Vb NP-Adj ! fruitNoun ! fliesVb ! likeDet ! aNoun ! bananaNoun ! tomatoAdj ! angry. . .

ExampleS

NP

Det

a

Noun

banana

VP

Vb

like

NP

Adj

angry

Noun

banana

18 / 48

Page 32: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with (P)CFGs

20 / 48

Page 33: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

21 / 48

Page 34: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI Natural Language is NOT generated by a CFG.

SolutionI We assume really hard that it is.

21 / 48

Page 35: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI Natural Language is NOT generated by a CFG.

SolutionI We assume really hard that it is.

21 / 48

Page 36: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI We don’t have the grammar.

Solution

I We’ll ask a genius linguist to write it!

21 / 48

Page 37: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI We don’t have the grammar.

SolutionI We’ll ask a genius linguist to write it!

21 / 48

Page 38: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI How do we find the chain of derivations?

Solution

I With dynamic programming! (soon)

21 / 48

Page 39: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI How do we find the chain of derivations?

SolutionI With dynamic programming! (soon)

21 / 48

Page 40: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI Real grammar: hundreds of possible derivations per

sentence.

Solution

I No problem! We’ll choose the best one. (sooner)

21 / 48

Page 41: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with CFGs

Let’s assume. . .I Let’s assume natural language is generated by a CFG.I . . . and let’s assume we have the grammar.I Then parsing is easy: given a sentence, find the chain of

derivations starting from S that generates it.

ProblemI Real grammar: hundreds of possible derivations per

sentence.

SolutionI No problem! We’ll choose the best one. (sooner)

21 / 48

Page 42: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Obtaining a Grammar

Let a genius linguist write it

I Hard. Many rules, many complex interactions.I Genius linguists don’t grow on trees !

An easier way - ask a linguist to grow trees

I Ask a linguist to annotate sentences with tree structure.I (This need not be a genius – Smart is enough.)I Then extract the rules from the annotated trees.

TreebanksI English Treebank: 40k sentences, manually annotated

with tree structure.I Hebrew Treebank: about 5k sentences

22 / 48

Page 43: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Obtaining a Grammar

Let a genius linguist write it

I Hard. Many rules, many complex interactions.I Genius linguists don’t grow on trees !

An easier way - ask a linguist to grow trees

I Ask a linguist to annotate sentences with tree structure.I (This need not be a genius – Smart is enough.)I Then extract the rules from the annotated trees.

TreebanksI English Treebank: 40k sentences, manually annotated

with tree structure.I Hebrew Treebank: about 5k sentences

22 / 48

Page 44: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Obtaining a Grammar

Let a genius linguist write it

I Hard. Many rules, many complex interactions.I Genius linguists don’t grow on trees !

An easier way - ask a linguist to grow trees

I Ask a linguist to annotate sentences with tree structure.I (This need not be a genius – Smart is enough.)I Then extract the rules from the annotated trees.

TreebanksI English Treebank: 40k sentences, manually annotated

with tree structure.I Hebrew Treebank: about 5k sentences

22 / 48

Page 45: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Treebank Sentence Example

( (S(NP-SBJ

(NP (NNP Pierre) (NNP Vinken) )(, ,)(ADJP

(NP (CD 61) (NNS years) )(JJ old) )

(, ,) )(VP (MD will)

(VP (VB join)(NP (DT the) (NN board) )(PP-CLR (IN as)

(NP (DT a) (JJ nonexecutive) (NN director) ))(NP-TMP (NNP Nov.) (CD 29) )))

(. .) ))

23 / 48

Page 46: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Supervised Learning from a Treebank

((fruit/ADJ flies/NN) (like/VB(a/DET banana/NN)))(time/NN (flies/VB (like/IN

(an/DET (arrow/NN))))). . . . . . . . .. . . . . . . . .

24 / 48

Page 47: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Extracting CFG from TreesI The leafs of the trees define ⌃I The internal nodes of the trees define NI Add a special S symbol on top of all treesI Each node an its children is a rule in R

Extracting RulesS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana

S ! NP VPNP ! Adj NounAdj ! fruit

25 / 48

Page 48: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Extracting CFG from TreesI The leafs of the trees define ⌃I The internal nodes of the trees define NI Add a special S symbol on top of all treesI Each node an its children is a rule in R

Extracting RulesS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana

S ! NP VPNP ! Adj NounAdj ! fruit

25 / 48

Page 49: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Extracting CFG from TreesI The leafs of the trees define ⌃I The internal nodes of the trees define NI Add a special S symbol on top of all treesI Each node an its children is a rule in R

Extracting RulesS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

bananaS ! NP VP

NP ! Adj NounAdj ! fruit

25 / 48

Page 50: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Extracting CFG from TreesI The leafs of the trees define ⌃I The internal nodes of the trees define NI Add a special S symbol on top of all treesI Each node an its children is a rule in R

Extracting RulesS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

bananaS ! NP VPNP ! Adj Noun

Adj ! fruit

25 / 48

Page 51: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Extracting CFG from TreesI The leafs of the trees define ⌃I The internal nodes of the trees define NI Add a special S symbol on top of all treesI Each node an its children is a rule in R

Extracting RulesS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

bananaS ! NP VPNP ! Adj NounAdj ! fruit

25 / 48

Page 52: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

From CFG to PCFGI English is NOT generated from CFG ) It’s generated by a

PCFG!

I PCFG: probabilistic context free grammar. Just like a CFG,but each rule has an associated probability.

I All probabilities for the same LHS sum to 1.I Multiplying all the rule probs in a derivation gives the

probability of the derivation.I We want the tree with maximum probability.

More Formally

P(tree, sent) =Y

l!r2deriv(tree)

p(l ! r)

tree = arg maxtree2trees(sent)

P(tree|sent) = arg maxtree2trees(sent)

P(tree, sent)

26 / 48

Page 53: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

From CFG to PCFGI English is NOT generated from CFG ) It’s generated by a

PCFG!I PCFG: probabilistic context free grammar. Just like a CFG,

but each rule has an associated probability.I All probabilities for the same LHS sum to 1.

I Multiplying all the rule probs in a derivation gives theprobability of the derivation.

I We want the tree with maximum probability.

More Formally

P(tree, sent) =Y

l!r2deriv(tree)

p(l ! r)

tree = arg maxtree2trees(sent)

P(tree|sent) = arg maxtree2trees(sent)

P(tree, sent)

26 / 48

Page 54: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

From CFG to PCFGI English is NOT generated from CFG ) It’s generated by a

PCFG!I PCFG: probabilistic context free grammar. Just like a CFG,

but each rule has an associated probability.I All probabilities for the same LHS sum to 1.I Multiplying all the rule probs in a derivation gives the

probability of the derivation.I We want the tree with maximum probability.

More Formally

P(tree, sent) =Y

l!r2deriv(tree)

p(l ! r)

tree = arg maxtree2trees(sent)

P(tree|sent) = arg maxtree2trees(sent)

P(tree, sent)

26 / 48

Page 55: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

From CFG to PCFGI English is NOT generated from CFG ) It’s generated by a

PCFG!I PCFG: probabilistic context free grammar. Just like a CFG,

but each rule has an associated probability.I All probabilities for the same LHS sum to 1.I Multiplying all the rule probs in a derivation gives the

probability of the derivation.I We want the tree with maximum probability.

More Formally

P(tree, sent) =Y

l!r2deriv(tree)

p(l ! r)

tree = arg maxtree2trees(sent)

P(tree|sent) = arg maxtree2trees(sent)

P(tree, sent)

26 / 48

Page 56: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

From CFG to PCFGI English is NOT generated from CFG ) It’s generated by a

PCFG!I PCFG: probabilistic context free grammar. Just like a CFG,

but each rule has an associated probability.I All probabilities for the same LHS sum to 1.I Multiplying all the rule probs in a derivation gives the

probability of the derivation.I We want the tree with maximum probability.

More Formally

P(tree, sent) =Y

l!r2deriv(tree)

p(l ! r)

tree = arg maxtree2trees(sent)

P(tree|sent) = arg maxtree2trees(sent)

P(tree, sent)

26 / 48

Page 57: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

PCFG Example

a simple PCFG1.0 S ! NP VP0.3 NP ! Adj Noun0.7 NP ! Det Noun1.0 VP ! Vb NP-0.2 Adj ! fruit0.2 Noun ! flies1.0 Vb ! like1.0 Det ! a0.4 Noun ! banana0.4 Noun ! tomato0.8 Adj ! angry

ExampleS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana1⇤0.3⇤0.2⇤0.7⇤1.0⇤0.2⇤1⇤1⇤0.4 =0.0033

27 / 48

Page 58: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

PCFG Example

a simple PCFG1.0 S ! NP VP0.3 NP ! Adj Noun0.7 NP ! Det Noun1.0 VP ! Vb NP-0.2 Adj ! fruit0.2 Noun ! flies1.0 Vb ! like1.0 Det ! a0.4 Noun ! banana0.4 Noun ! tomato0.8 Adj ! angry

ExampleS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana1⇤0.3⇤0.2⇤0.7⇤1.0⇤0.2⇤1⇤1⇤0.4 =0.0033

27 / 48

Page 59: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

PCFG Example

a simple PCFG1.0 S ! NP VP0.3 NP ! Adj Noun0.7 NP ! Det Noun1.0 VP ! Vb NP-0.2 Adj ! fruit0.2 Noun ! flies1.0 Vb ! like1.0 Det ! a0.4 Noun ! banana0.4 Noun ! tomato0.8 Adj ! angry

ExampleS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana1⇤0.3⇤0.2⇤0.7⇤1.0⇤0.2⇤1⇤1⇤0.4 =0.0033

27 / 48

Page 60: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

PCFG Example

a simple PCFG1.0 S ! NP VP0.3 NP ! Adj Noun0.7 NP ! Det Noun1.0 VP ! Vb NP-0.2 Adj ! fruit0.2 Noun ! flies1.0 Vb ! like1.0 Det ! a0.4 Noun ! banana0.4 Noun ! tomato0.8 Adj ! angry

ExampleS

NP

Adj

Fruit

Noun

Flies

VP

Vb

like

NP

Det

a

Noun

banana1⇤0.3⇤0.2⇤0.7⇤1.0⇤0.2⇤1⇤1⇤0.4 =0.0033

27 / 48

Page 61: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with PCFG

I Parsing with a PCFG is finding the most probablederivation for a given sentence.

I This can be done quite efficiently with dynamicprogramming (the CKY algorithm)

Obtaining the probabilities

I We estimate them from the Treebank.I P(LHS ! RHS) = count(LHS!RHS)

count(LHS!⌃)I We can also add smoothing and backoff, as before.I Dealing with unknown words - like in the HMM

28 / 48

Page 62: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing with PCFG

I Parsing with a PCFG is finding the most probablederivation for a given sentence.

I This can be done quite efficiently with dynamicprogramming (the CKY algorithm)

Obtaining the probabilities

I We estimate them from the Treebank.I P(LHS ! RHS) = count(LHS!RHS)

count(LHS!⌃)I We can also add smoothing and backoff, as before.I Dealing with unknown words - like in the HMM

28 / 48

Page 63: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The CKY algorithm

29 / 48

Page 64: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The ProblemInput

I Sentence (a list of words)I n – sentence length

I CFG Grammar (with weights on rules)I g – number of non-terminal symbols

Output

I A parse tree / the best parse tree

But. . .I Exponentially many possible parse trees!

SolutionI Dynamic Programming!

30 / 48

Page 65: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

CKY

Cocke Kasami Younger

31 / 48

Page 66: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

CKY

Cocke Kasami Younger196?

31 / 48

Page 67: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

CKY

Cocke Kasami Younger196? 1965

31 / 48

Page 68: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

CKY

Cocke Kasami Younger196? 1965 1967

31 / 48

Page 69: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

3 Interesting Problems

I Recognition

I Can this string be generated by the grammar?

I Parsing

I Show me a possible derivation. . .

I Disambiguation

I Show me THE BEST derivation

CKY can do all of these in polynomial time

I For any CNF grammar

32 / 48

Page 70: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

3 Interesting Problems

I RecognitionI Can this string be generated by the grammar?

I Parsing

I Show me a possible derivation. . .

I Disambiguation

I Show me THE BEST derivation

CKY can do all of these in polynomial time

I For any CNF grammar

32 / 48

Page 71: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

3 Interesting Problems

I RecognitionI Can this string be generated by the grammar?

I ParsingI Show me a possible derivation. . .

I Disambiguation

I Show me THE BEST derivation

CKY can do all of these in polynomial time

I For any CNF grammar

32 / 48

Page 72: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

3 Interesting Problems

I RecognitionI Can this string be generated by the grammar?

I ParsingI Show me a possible derivation. . .

I DisambiguationI Show me THE BEST derivation

CKY can do all of these in polynomial time

I For any CNF grammar

32 / 48

Page 73: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

3 Interesting Problems

I RecognitionI Can this string be generated by the grammar?

I ParsingI Show me a possible derivation. . .

I DisambiguationI Show me THE BEST derivation

CKY can do all of these in polynomial time

I For any CNF grammar

32 / 48

Page 74: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

3 Interesting Problems

I RecognitionI Can this string be generated by the grammar?

I ParsingI Show me a possible derivation. . .

I DisambiguationI Show me THE BEST derivation

CKY can do all of these in polynomial time

I For any CNF grammar

32 / 48

Page 75: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

CNFChomsky Normal Form

DefinitionA CFG is in CNF form if it only has rules like:

I A ! B CI A ! ↵

A,B,C are non terminal symbols↵ is a terminal symbol (a word. . . )

I All terminal symbols are RHS of unary rulesI All non terminal symbols are RHS of binary rules

CKY can be easily extended to handle also unary rules: A ! B

33 / 48

Page 76: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Binarization

FactI Any CFG grammar can be converted to CNF form

Speficifally for Natural Language grammarsI We already have A ! ↵

I (A ! ↵ � is also easy to handle)I Unary rules (A ! B) are OKI Only problem:S ! NP PP VP PP

BinarizationS ! NP NP|PP.VP.PPNP|PP.VP.PP ! PP NP.PP|VP.PPNP.PP|VP.PP ! VP NP.PP.VP|PP

34 / 48

Page 77: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Binarization

FactI Any CFG grammar can be converted to CNF form

Speficifally for Natural Language grammarsI We already have A ! ↵

I (A ! ↵ � is also easy to handle)I Unary rules (A ! B) are OKI Only problem:S ! NP PP VP PP

BinarizationS ! NP NP|PP.VP.PPNP|PP.VP.PP ! PP NP.PP|VP.PPNP.PP|VP.PP ! VP NP.PP.VP|PP

34 / 48

Page 78: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Binarization

FactI Any CFG grammar can be converted to CNF form

Speficifally for Natural Language grammarsI We already have A ! ↵

I (A ! ↵ � is also easy to handle)

I Unary rules (A ! B) are OKI Only problem:S ! NP PP VP PP

BinarizationS ! NP NP|PP.VP.PPNP|PP.VP.PP ! PP NP.PP|VP.PPNP.PP|VP.PP ! VP NP.PP.VP|PP

34 / 48

Page 79: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Binarization

FactI Any CFG grammar can be converted to CNF form

Speficifally for Natural Language grammarsI We already have A ! ↵

I (A ! ↵ � is also easy to handle)I Unary rules (A ! B) are OK

I Only problem:S ! NP PP VP PP

BinarizationS ! NP NP|PP.VP.PPNP|PP.VP.PP ! PP NP.PP|VP.PPNP.PP|VP.PP ! VP NP.PP.VP|PP

34 / 48

Page 80: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Binarization

FactI Any CFG grammar can be converted to CNF form

Speficifally for Natural Language grammarsI We already have A ! ↵

I (A ! ↵ � is also easy to handle)I Unary rules (A ! B) are OKI Only problem:S ! NP PP VP PP

BinarizationS ! NP NP|PP.VP.PPNP|PP.VP.PP ! PP NP.PP|VP.PPNP.PP|VP.PP ! VP NP.PP.VP|PP

34 / 48

Page 81: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Binarization

FactI Any CFG grammar can be converted to CNF form

Speficifally for Natural Language grammarsI We already have A ! ↵

I (A ! ↵ � is also easy to handle)I Unary rules (A ! B) are OKI Only problem:S ! NP PP VP PP

BinarizationS ! NP NP|PP.VP.PPNP|PP.VP.PP ! PP NP.PP|VP.PPNP.PP|VP.PP ! VP NP.PP.VP|PP

34 / 48

Page 82: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Finally, CKY

Recognition

I Main idea:I Build parse tree from bottom upI Combine built trees to form bigger trees using grammar

rulesI When left with a single tree, verify root is S

I Exponentially many possible trees. . .I Search over all of them in polynomial time using DPI Shared structure – smaller trees

35 / 48

Page 83: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Main Idea

If we know:

I wi . . .wj is an NPI wj+1 . . .wk is a VP

and grammar has rule:I S ! NP VP

Then we know:I S can derive wi . . .wk

36 / 48

Page 84: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Data Structure(Half a) two dimensional array (n x n)

37 / 48

Page 85: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Data StructureOn its side

38 / 48

Page 86: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Data StructureEach cell: all nonterminals than can derive word i to word j

Sue saw her girl with a telescope

38 / 48

Page 87: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Data StructureEach cell: all nonterminals than can derive word i to word jimagine each cell as a g dimensional array

Sue saw her girl with a telescope

38 / 48

Page 88: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Filling the table

Sue saw her girl with a telescope

39 / 48

Page 89: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Handling Unary rules?

Sue saw her girl with a telescope

40 / 48

Page 90: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Which Order?

Sue saw her boy with a telescope

41 / 48

Page 91: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Complexity?

I n2g cells to fillI g2n ways to fill each one

O(g3n3)

42 / 48

Page 92: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Complexity?

I n2g cells to fill

I g2n ways to fill each one

O(g3n3)

42 / 48

Page 93: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Complexity?

I n2g cells to fillI g2n ways to fill each one

O(g3n3)

42 / 48

Page 94: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Complexity?

I n2g cells to fillI g2n ways to fill each one

O(g3n3)

42 / 48

Page 95: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

A Note on Implementation

Smart implementation can reduce the runtime:I Worst case is still O(g3n3), but it helps in practice

I No need to check all grammar rules A ! BC at eachlocation:

I only those compatible with B or C of current splitI prune binarized symbols which are too long for current

positionI once you found 1 way to derive A can break out of loopI order grammar rules from frequent to infrequent

I Need both efficient random access and iteration overpossible symbols

I Keep both hash and list, implemented as arrays

43 / 48

Page 96: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Finding a parseParsing – we want to actually find a parse tree

Easy: also keep a possible split point for each NT

44 / 48

Page 97: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

PCFG Parsing and DisambiguationDisambiguation – we want THE BEST parse tree

Easy: for each NT, keep best split point, and score.

45 / 48

Page 98: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Implementation Tricks#1: sum instead of product

As in the HMM - Multiplying probabilities is evilI keeping the product of many floating point numbers is

dangerous, because product get really smallI either grow in runtimeI or loose precision (overflowing to 0)I either way, multiplying floats is expensive

Solution: use sum of logs instead

I remember: log(p1 ⇤ p2) = log(p1) + log(p2)) Use log probabilities instead of probabilities) add instead of multiply

46 / 48

Page 99: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The big question

Does this work?

8 / 1

Page 100: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Evaluation

9 / 1

Page 101: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Evaluation

I Let’s assume we have a parser, how do we know howgood it is?

) Compare output trees to gold trees.

I But how do we compare trees?I Credit of 1 if tree is correct and 0 otherwise, is too harsh.

I Represent each tree as a set of labeled spans.I NP from word 1 to word 5.I VP from word 3 to word 4.I S from word 1 to word 23.I . . .

I Measure Precision, Recall and F1 over these spans, as inthe segmentation case.

10 / 1

Page 102: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Evaluation

I Let’s assume we have a parser, how do we know howgood it is?

) Compare output trees to gold trees.I But how do we compare trees?I Credit of 1 if tree is correct and 0 otherwise, is too harsh.

I Represent each tree as a set of labeled spans.I NP from word 1 to word 5.I VP from word 3 to word 4.I S from word 1 to word 23.I . . .

I Measure Precision, Recall and F1 over these spans, as inthe segmentation case.

10 / 1

Page 103: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Evaluation

I Let’s assume we have a parser, how do we know howgood it is?

) Compare output trees to gold trees.I But how do we compare trees?I Credit of 1 if tree is correct and 0 otherwise, is too harsh.

I Represent each tree as a set of labeled spans.I NP from word 1 to word 5.I VP from word 3 to word 4.I S from word 1 to word 23.I . . .

I Measure Precision, Recall and F1 over these spans, as inthe segmentation case.

10 / 1

Page 104: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Evaluation: Representing Trees as Constituents

S

NP

DT

the

NN

lawyer

VP

Vt

questioned

NP

DT

the

NN

witness

Label Start Point End Point

NP 1 2NP 4 5VP 3 5S 1 5

(by Mike Collins)

Page 105: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Precision and RecallLabel Start Point End Point

NP 1 2NP 4 5NP 4 8PP 6 8NP 7 8VP 3 8S 1 8

Label Start Point End Point

NP 1 2NP 4 5PP 6 8NP 7 8VP 3 8S 1 8

I G = number of constituents in gold standard = 7

I P = number in parse output = 6

I C = number correct = 6

Recall = 100%⇥ C

G= 100%⇥ 6

7Precision = 100%⇥ C

P= 100%⇥ 6

6

(by Mike Collins)

Page 106: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Evaluation

I Is this a good measure?I Why? Why not?

11 / 1

(by Mike Collins)

Page 107: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Evaluation

How well does the PCFG parser we learned do?

Not very well: about 73% F1 score.

12 / 1

(by Mike Collins)

Page 108: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Problems with PCFGs

13 / 1

(by Mike Collins)

Page 109: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Weaknesses of Probabilistic Context-Free Grammars

Michael Collins, Columbia University

(by Mike Collins)

Page 110: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Weaknesses of PCFGs

I Lack of sensitivity to lexical information

I Lack of sensitivity to structural frequencies

(by Mike Collins)

Page 111: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

S

NP

NNP

IBM

VP

Vt

bought

NP

NNP

Lotus

p(t) = q(S ! NP VP) ⇥q(NNP ! IBM)⇥q(VP ! V NP) ⇥q(Vt ! bought)⇥q(NP ! NNP) ⇥q(NNP ! Lotus)⇥q(NP ! NNP)

(by Mike Collins)

Page 112: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Another Case of PP Attachment Ambiguity(a) S

NP

NNS

workers

VP

VP

VBD

dumped

NP

NNS

sacks

PP

IN

into

NP

DT

a

NN

bin(b) S

NP

NNS

workers

VP

VBD

dumped

NP

NP

NNS

sacks

PP

IN

into

NP

DT

a

NN

bin

(by Mike Collins)

Page 113: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

(a)

RulesS ! NP VPNP ! NNSVP ! VP PP

VP ! VBD NPNP ! NNSPP ! IN NPNP ! DT NNNNS ! workersVBD ! dumpedNNS ! sacksIN ! intoDT ! aNN ! bin

(b)

RulesS ! NP VPNP ! NNSNP ! NP PP

VP ! VBD NPNP ! NNSPP ! IN NPNP ! DT NNNNS ! workersVBD ! dumpedNNS ! sacksIN ! intoDT ! aNN ! bin

If q(NP ! NP PP) > q(VP ! VP PP) then (b) is moreprobable, else (a) is more probable.Attachment decision is completely independent of the

words

(by Mike Collins)

Page 114: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

A Case of Coordination Ambiguity

(a) NP

NP

NP

NNS

dogs

PP

IN

in

NP

NNS

houses

CC

and

NP

NNS

cats

(b) NP

NP

NNS

dogs

PP

IN

in

NP

NP

NNS

houses

CC

and

NP

NNS

cats

(by Mike Collins)

Page 115: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

(a)

RulesNP ! NP CC NPNP ! NP PPNP ! NNSPP ! IN NPNP ! NNSNP ! NNSNNS ! dogsIN ! inNNS ! housesCC ! andNNS ! cats

(b)

RulesNP ! NP CC NPNP ! NP PPNP ! NNSPP ! IN NPNP ! NNSNP ! NNSNNS ! dogsIN ! inNNS ! housesCC ! andNNS ! cats

Here the two parses have identical rules, and

therefore have identical probability under any

assignment of PCFG rule probabilities

(by Mike Collins)

Page 116: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Structural Preferences: Close Attachment

(a) NP

NP

NN

PP

IN NP

NP

NN

PP

IN NP

NN

(b) NP

NP

NP

NN

PP

IN NP

NN

PP

IN NP

NN

I Example: president of a company in Africa

I Both parses have the same rules, therefore receive sameprobability under a PCFG

I “Close attachment” (structure (a)) is twice as likely in WallStreet Journal text.

Page 117: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Lexicalized PCFGs

PCFG Problem 1Lack of sensitivity to lexical information (words)

SolutionI Make PCFG aware of words (lexicalized PCFG)I Main Idea: Head Words

14 / 1

Page 118: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Head Words

Each constituent has one words which captures its “essence”.

I (S John saw the young boy with the large hat)I (VP saw the young boy with the large hat)I (NP the young boy with the large hat)I (NP the large hat)I (PP with the large hat)

I hat is the “semantic head”I with is the “functional head”I (it is common to choose the functional head)

15 / 1

Page 119: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Heads in Context-Free Rules

Add annotations specifying the “head” of each rule:

S ) NP VPVP ) ViVP ) Vt NPVP ) VP PPNP ) DT NNNP ) NP PPPP ) IN NP

Vi ) sleepsVt ) sawNN ) manNN ) womanNN ) telescopeDT ) theIN ) withIN ) in

Page 120: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

More about Heads

I Each context-free rule has one “special” child that is thehead of the rule. e.g.,

S ) NP VP (VP is the head)VP ) Vt NP (Vt is the head)NP ) DT NN NN (NN is the head)

I A core idea in syntax(e.g., see X-bar Theory, Head-Driven Phrase StructureGrammar)

I Some intuitions:

I The central sub-constituent of each rule.I The semantic predicate in each rule.

Page 121: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Rules which Recover Heads: An Example for NPs

If the rule contains NN, NNS, or NNP:Choose the rightmost NN, NNS, or NNP

Else If the rule contains an NP: Choose the leftmost NP

Else If the rule contains a JJ: Choose the rightmost JJ

Else If the rule contains a CD: Choose the rightmost CD

Else Choose the rightmost child

e.g.,NP ) DT NNP NNNP ) DT NN NNPNP ) NP PPNP ) DT JJNP ) DT

Page 122: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Rules which Recover Heads: An Example for VPs

If the rule contains Vi or Vt: Choose the leftmost Vi or Vt

Else If the rule contains an VP: Choose the leftmost VP

Else Choose the leftmost child

e.g.,VP ) Vt NPVP ) VP PP

Page 123: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Adding Headwords to Trees

S

NP

DT

the

NN

lawyer

VP

Vt

questioned

NP

DT

the

NN

witness

+

S(questioned)

NP(lawyer)

DT(the)

the

NN(lawyer)

lawyer

VP(questioned)

Vt(questioned)

questioned

NP(witness)

DT(the)

the

NN(witness)

witness

Page 124: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Adding Headwords to Trees (Continued)S(questioned)

NP(lawyer)

DT(the)

the

NN(lawyer)

lawyer

VP(questioned)

Vt(questioned)

questioned

NP(witness)

DT(the)

the

NN(witness)

witness

I A constituent receives its headword from its head child.

S ) NP VP (S receives headword from VP)VP ) Vt NP (VP receives headword from Vt)NP ) DT NN (NP receives headword from NN)

Page 125: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Adding Headwords to Trees (Continued)S(questioned)

NP(lawyer)

DT(the)

the

NN(lawyer)

lawyer

VP(questioned)

Vt(questioned)

questioned

NP(witness)

DT(the)

the

NN(witness)

witness

I A constituent receives its headword from its head child.

S ) NP VP (S receives headword from VP)VP ) Vt NP (VP receives headword from Vt)NP ) DT NN (NP receives headword from NN)

We can parse a lexicalized grammar in O( ) [how?]n5

Page 126: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

16 / 1

Page 127: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

S(questioned)

NP(lawyer)

DT(the)

the

NN(lawyer)

lawyer

VP(questioned)

Vt(questioned)

questioned

NP(witness)

DT(the)

the

NN(witness)

witness

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 128: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

questioned

lawyer

the

the

lawyer

lawyer

questioned

questioned

questioned

witness

the

the

witness

witness

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 129: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

questioned

lawyer

the

the

lawyer

lawyer

questioned

questioned

questioned

witness

the

the

witness

witness

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 130: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

questioned

lawyer

the

witness

the

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 131: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

questioned

lawyer

the

witness

the

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 132: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 133: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Representation

Dependency representation is very common.We will return to it in the future.

18 / 1

Page 134: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Parsing

21 / 1

Page 135: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Evaluation Measures

I UAS. Unlabeled Attachment Scores(% of words with correct head)

I LAS. Labeled Attachment Scores(% of words with correct head and label)

I Root(% of sentences with correct root)

I Exact(% of sentences with exact correct structure)

22 / 1

Page 136: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Evaluation Measures

I UAS. Unlabeled Attachment Scores 90-94 (Eng, WSJ)(% of words with correct head)

I LAS. Labeled Attachment Scores 87-92 (Eng, WSJ)(% of words with correct head and label)

I Root ⇠90 (Eng, WSJ)(% of sentences with correct root)

I Exact 40-50 (Eng, WSJ)(% of sentences with exact correct structure)

22 / 1

Page 137: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Three main approaches to Dependency ParsingConversion

I Parse to constituency structure.I Extract dependencies from the trees.

Global Optimization (Graph based)

I Define a scoring function over <sentence,tree> pairs.I Search for best-scoring structure.I Simpler scoring ) easier search.I (Similar to how we do tagging, constituency parsing.)

Greedy decoding (Transition based)

I Start with an unparsed sentence.I Apply locally-optimal actions until sentence is parsed.

23 / 1

Page 138: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Three main approaches to Dependency ParsingConversion

I Parse to constituency structure.I Extract dependencies from the trees.

Global Optimization (Graph based)

I Define a scoring function over <sentence,tree> pairs.I Search for best-scoring structure.I Simpler scoring ) easier search.I (Similar to how we do tagging, constituency parsing.)

Greedy decoding (Transition based)

I Start with an unparsed sentence.I Apply locally-optimal actions until sentence is parsed.

23 / 1

argmax over combinatorial space

while (!done) { do best thing }

Page 139: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph-based parsing (Global Search)

24 / 1

Page 140: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Parsing

Alexander [email protected]

NYU CS 3033

Page 141: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Arcs

Dependency parsing is concerned with head-modifier relationships.

Definitions:

I head; the main word in a phrase

I modifier; an auxiliary word in a phrase

Meaning depends on underlying linguistic formalism.

Common to use head!modifier arc notation

* Millions on the coast face freak storm

Page 142: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Input Notation

Input:

I x = (w , t)

I w1 . . .wn; the words of the sentence

I t1 . . . tn; the tags of the sentence

I Special symbol w0 = ⇤; the pseudo-root

Note: Unlike in CFG parsing, we assume tags are given.

Page 143: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Output Notation

Output:

I set of possible dependency arcs

A = {(h,m) : h 2 {0 . . . n},m 2 {1 . . . n}}

I Y ⇢ {0, 1}|A|; set of all valid dependency parses

I y 2 Y; a valid dependency parse

Page 144: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Example

* Millions/N on/P the/D coast/N face/V freak/A storm/N

I w0 = ⇤, w1 = Millions, w2 = on, w3 = the, . . .

I t0 = ⇤, t1 = N, t2 = P, t3 = D, . . .

I y(0, 5) = 1, y(5, 1) = 1, y(1, 2) = 1 . . .

Page 145: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Example

* Millions/N on/P the/D coast/N face/V freak/A storm/N

I w0 = ⇤, w1 = Millions, w2 = on, w3 = the, . . .

I t0 = ⇤, t1 = N, t2 = P, t3 = D, . . .

I y(0, 5) = 1, y(5, 1) = 1, y(1, 2) = 1 . . .

Page 146: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Example

* Millions/N on/P the/D coast/N face/V freak/A storm/N

I w0 = ⇤, w1 = Millions, w2 = on, w3 = the, . . .

I t0 = ⇤, t1 = N, t2 = P, t3 = D, . . .

I y(0, 5) = 1, y(5, 1) = 1, y(1, 2) = 1 . . .

Page 147: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Forbidden Structures

I Each (non-root) word must modify exactly one word.

* Millions on the coast face freak storm

I Arcs must form a tree.

* Millions on the coast face freak storm

I (Projective) Arcs may not cross each other.

* Millions on the coast face freak storm

Page 148: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Main Idea

I Define a scoring function g(y; x, ✓)

I This function will tell us, for every x (sentence) and y (tree)pair, how good the pair is.

I ✓ are the parameters, or weights (we called them w before)I For example: g(y; x, ✓) =

Pi�i(x, y)✓i = �(x, y) · ✓

I (a linear model)I Look for the best y for a given sentence arg max

yg(y; x, ✓)

25 / 1

Page 149: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

at is a good dependency parse?

y⇤ = argmax

y2Yg(y ; x , ✓)

Method:

I Define features for this problem.

I Learn parameters ✓ from corpus data.

I Maximize objective to find best parse y⇤.

Page 150: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) =

score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 151: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 152: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 153: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 154: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 155: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 156: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 157: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-order Scoring Function

Scoring function g(y ; x , ✓) is the sum of first-order arc scores

* Millions on the coast face freak storm

g(y ; x , ✓) = score(coast ! the)

+ score(on ! coast)

+ score(Millions ! on)

+ score(face ! millions)

+ score(face ! storm)

+ score(storm ! freak)

+ score(⇤ ! face)

Page 158: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Conditional Model (e.g. CRF)

Define:

score(wh ! wm) = �(x , hh,mi) · ✓

where:

I �(x , hh,mi) : X ⇥A ! {0, 1}p; a feature function

I ✓ 2 Rp; a parameter vector (assume given)

I p; number of features

Feature-based Discriminative Model

Page 159: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features

I Features are critical for dependency parsing performance.

I Specified as a vector of indicators.

�NAME(ht,wi, hh,mi) =⇢

1, if tm = u

0, o.w.

I Each feature has a corresponding real-value weight.

✓NAME = 9.23

Page 160: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Tags

8u 2 T �TAG:M:u(ht,wi, hh,mi) =⇢

1, if tm = u

0, o.w.

8u 2 T �TAG:H:u(ht,wi, hh,mi) =⇢

1, if th = u

0, o.w.

8u, v 2 T �TAG:H:M:u:v (ht,wi, hh,mi) =⇢

1, if th = u and tm = v

0, o.w.

* Millions/N on/P the/D coast/N face/V freak/A storm/N

Page 161: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Words

8u 2 W �WORD:M:u(ht,wi, hh,mi) =⇢

1, if wm = u

0, o.w.

8u 2 W �WORD:H:u(ht,wi, hh,mi) =⇢

1, if wh = u

0, o.w.

8u, v 2 W �WORD:H:M:u:v (ht,wi, hh,mi) =⇢

1, if wh = u and wm = v

0, o.w.

* Millions/N on/P the/D coast/N face/V freak/A storm/N

Page 162: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Context Tags

8u 2 T 4 �CON:�1:�1:u(ht,wi, hh,mi) =

8<

:

1, if th�1 = u1 and th = u2

and tm�1 = u3 and tm = u4

0, o.w.

8u 2 T 4 �CON:1:�1:u(ht,wi, hh,mi) =

8<

:

1, if th+1 = u1 and th = u2

and tm�1 = u3 and tm = u4

0, o.w.

* Millions/N on/P the/D coast/N face/V freak/A storm/N

Page 163: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Between Tags

8u 2 T �BET:u(ht,wi, hh,mi) =⇢

1, if ti = u for i between h and m

0, o.w.

* Millions/N on/P the/D coast/N face/V freak/A storm/N

Page 164: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Direction

�RIGHT(ht,wi, hh,mi) =⇢

1, if h > m

0, o.w.

�LEFT(ht,wi, hh,mi) =⇢

1, if h < m

0, o.w.

* Millions/N on/P the/D coast/N face/V freak/A storm/N

Page 165: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Length

�LEN:2(ht,wi, hh,mi) =⇢

1, if |h �m| > 20, o.w.

�LEN:5(ht,wi, hh,mi) =⇢

1, if |h �m| > 50, o.w.

�LEN:10(ht,wi, hh,mi) =⇢

1, if |h �m| > 100, o.w.

* Millions/N on/P the/D coast/N face/V freak/A storm/N

Page 166: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Features: Backo↵s and Combinations

I Additionally include backo↵.

8u 2 T 3 �CON:�1:u(ht,wi, hh,mi) =

8<

:

1, if th�1 = u1 and th = u2

and tm = u3

0, o.w.

I As well as combination features.

8u 2 W �LEN:2:DIR:LEFT:TAG:M:u(ht,wi, hh,mi) =⇢

1, if all on0, o.w.

Page 167: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

First-Order Results

Model AccuracyNoPOSContextBetween 86.0NoEdge 87.3NoAttachmentOrDistance 88.1NoBiLex 90.6Full 90.7

From McDonald (2006)

Page 168: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

What’s left

I Define features for this problem.

I Learn parameters ✓ from corpus data.

I Maximize objective to find best parse y⇤.

Downside: Higher-order models make inference more di�cult

y⇤ = argmax

y2Yg(y ; x , ✓)

Page 169: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

What’s left

I Define features for this problem.

I Learn parameters ✓ from corpus data.

I Maximize objective to find best parse y⇤.

Downside: Higher-order models make inference more di�cult

y⇤ = argmax

y2Yg(y ; x , ✓)

Page 170: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

Goal: Finding the best parse.

y⇤ = argmax

y2Yg(y ; x , ✓)

Page 171: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph Algorithms

Algorithm 2: Use graph algorithms for parsing.

Find the maximum directed spanning tree.

I Chou-Liu-Edmonds Algorithm O(n3)

I Tarjan’s Extension O(n2)

Page 172: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph Algorithms

Algorithm 2: Use graph algorithms for parsing.

Find the maximum directed spanning tree.

I Chou-Liu-Edmonds Algorithm O(n3)

I Tarjan’s Extension O(n2)

Page 173: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Maximum Directed Spanning Tree Algorithm

Page 174: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Issues with MST Algorithm

I Allows non-projective parses.

* Millions on the coast face freak storm

I Good for some languages.

I Cannot incorporate higher-order parts.I Problem becomes NP-Hard.

Page 175: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dynamic Programming for Parsing

Algorithm 3: Use a specialized dynamic programming algorithm.

I The Eisner algorithm (1996) for bilexical parsing.

I Use split-head trick. Handle left and right dependenciesseparately.

Page 176: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Parsing New Example

* As McGwire neared , fans went wild

Page 177: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Base Case

* As McGwire neared , fans went wild;

Page 178: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Dependency Parsing Algorithm - First-order Model

h m

h r

+

mr + 1

h e

h m

+

m e

Page 179: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 180: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 181: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 182: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 183: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 184: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 185: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 186: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 187: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing

* As McGwire neared , fans went wild

Page 188: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Algorithm Key

I L; left-facing item

I R; right-facing item

I C; completed item (triangle)

I I; incomplete item (trapezoid)

Page 189: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

AlgorithmInitialize:for i in 0 . . . n do

⇡[C,L, i , i ] = 0⇡[C,R, i , i ] = 0⇡[I,L, i , i ] = 0⇡[I,R, i , i ] = 0

Inner Loop:for k in 1 . . . n do

for s in 0 . . . n do

t k + s

if t � n then break

⇡[I,L, s, t] = maxr2s...t�1 ⇡[C,R, s, r ] + ⇡[C,L, r + 1, t]⇡[I,R, s, t] = maxr2s...t�1 ⇡[C,R, s, r ] + ⇡[C ,L, r + 1, t]⇡[C,L, s, t] = maxr2s...t�1 ⇡[C,L, s, r ] + ⇡[I,L, r , t]⇡[C,R, s, t] = maxr2s+1...t ⇡[I,R, s, r ] + ⇡[C,R, r , t]

return ⇡[C,R, 0, n]

Page 190: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph-based parsing algorithm

I Begin with a tagged sentence (can use a POS-tagger)

I Extract a set of “parts”I For a first-order model, each part is a (h,m) pair

(O(n2) parts)I For a second-order model, each part is a (h,m1,m2) tuple

(O(n3) parts)I Calculate a score for each part (using feature-extractor �

and parameters ✓)I Find a valid parse tree that is composed of the best parts.

I using Chu-Liu-Edmunds (for first-order non-projective)(O(n2))

I using a dynamic-programming algorithm (for first- andsecond-order projective)(O(n3))

Does this remind you of anything?

26 / 1

Page 191: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph-based parsing algorithm

I Begin with a tagged sentence (can use a POS-tagger)I Extract a set of “parts”

I For a first-order model, each part is a (h,m) pair(O(n2) parts)

I For a second-order model, each part is a (h,m1,m2) tuple(O(n3) parts)

I Calculate a score for each part (using feature-extractor �and parameters ✓)

I Find a valid parse tree that is composed of the best parts.I using Chu-Liu-Edmunds (for first-order non-projective)

(O(n2))I using a dynamic-programming algorithm (for first- and

second-order projective)(O(n3))

Does this remind you of anything?

26 / 1

Page 192: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph-based parsing algorithm

I Begin with a tagged sentence (can use a POS-tagger)I Extract a set of “parts”

I For a first-order model, each part is a (h,m) pair(O(n2) parts)

I For a second-order model, each part is a (h,m1,m2) tuple(O(n3) parts)

I Calculate a score for each part (using feature-extractor �and parameters ✓)

I Find a valid parse tree that is composed of the best parts.I using Chu-Liu-Edmunds (for first-order non-projective)

(O(n2))I using a dynamic-programming algorithm (for first- and

second-order projective)(O(n3))

Does this remind you of anything?

26 / 1

Page 193: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph-based parsing algorithm

I Begin with a tagged sentence (can use a POS-tagger)I Extract a set of “parts”

I For a first-order model, each part is a (h,m) pair(O(n2) parts)

I For a second-order model, each part is a (h,m1,m2) tuple(O(n3) parts)

I Calculate a score for each part (using feature-extractor �and parameters ✓)

I Find a valid parse tree that is composed of the best parts.I using Chu-Liu-Edmunds (for first-order non-projective)

(O(n2))I using a dynamic-programming algorithm (for first- and

second-order projective)(O(n3))

Does this remind you of anything?

26 / 1

Page 194: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Graph-based parsing algorithm

I Begin with a tagged sentence (can use a POS-tagger)I Extract a set of “parts”

I For a first-order model, each part is a (h,m) pair(O(n2) parts)

I For a second-order model, each part is a (h,m1,m2) tuple(O(n3) parts)

I Calculate a score for each part (using feature-extractor �and parameters ✓)

I Find a valid parse tree that is composed of the best parts.I using Chu-Liu-Edmunds (for first-order non-projective)

(O(n2))I using a dynamic-programming algorithm (for first- and

second-order projective)(O(n3))

Does this remind you of anything?

26 / 1

Page 195: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Training - setting values for ✓

27 / 1

Page 196: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Note: we need values such that g(y; x, ✓) of gold tree y is largerthan g(y0; x, ✓) for all other trees y

0.

28 / 1

Page 197: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Perceptron Sketch: Part 1

I (x1, y1) . . . (xn, yn); training data

I Gold features X

a2A:y(a)=1

�(xi , a)

Idea: Increase value (in ✓) of gold features.

Page 198: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Perceptron Sketch: Part 2

I Best-scoring structure

zi = argmaxz2Y

g(z ; x , ✓)

I Best-scoring structure features

X

a2A:z(a)=1

�(xi , a)

Idea: Decrease value (in ✓) of wrong best-scoring features

Page 199: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Perceptron Algorithm

✓ 0for t = 1 . . . T, i = 1 . . . n do

zi = argmaxy2Y

g(y ; xi , ✓)

gold X

a2A:yi (a)=1

�(xi , a)

best X

a2A:zi (a)=1

�(xi , a)

✓ ✓ + gold � best

return ✓

Page 200: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Theory

I If possible, perceptron will separate the correct structure fromthe incorrect structure.

I That is, it will find a ✓ that assigns yi a higher score thanother y 2 Y for each example.

Page 201: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Practical Training Considerations

I Training requires solving inference many times.

I Often times computing feature values is time consuming.

I In practice, averaged perceptron variant preferred (Collins,2002).

Page 202: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Conclusion

Method:

I Define features for this problem.

I Learn parameters ✓ from corpus data.

I Maximize objective to find best parse y⇤.

Structured prediction framework, applicable to many problems.

Page 203: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Transition-based parsing

29 / 1

Page 204: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Transition-based (greedy) parsing

1. Start with an unparsed sentence.2. Apply locally-optimal actions until sentence is parsed.

3. Use whatever features you want.4. Surprisingly accurate.5. Can be extremely fast.

30 / 1

Page 205: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Transition-based (greedy) parsing

1. Start with an unparsed sentence.2. Apply locally-optimal actions until sentence is parsed.3. Use whatever features you want.4. Surprisingly accurate.5. Can be extremely fast.

30 / 1

Page 206: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Intro to Transition-based Dependency Parsing

An abstract machine composed of a stack and a buffer.

Machine is initialized with the words of a sentence.

A set of actions process the words by moving them from bufferto stack, removing them from the stack, or adding links betweenthem.

A specific set of actions define a transition system.

31 / 1

Page 207: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The Arc-Eager Transition System

I SHIFT move first word from bufferto stack.(pre: Buffer not empty.)

I LEFTARClabel make first word inbuffer head of top of stack, popthe stack.(pre: Stack not empty. Top of stack doesnot have a parent.)

I RIGHTARClabel make top of stackhead of first in buffer, move firstin buffer to stack.(pre: Buffer not empty.)

I REDUCE pop the stack(pre: Stack not empty. Top of stack has aparent.)

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

32 / 1

Page 208: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The Arc-Eager Transition System

I SHIFT move first word from bufferto stack.(pre: Buffer not empty.)

I LEFTARClabel make first word inbuffer head of top of stack, popthe stack.(pre: Stack not empty. Top of stack doesnot have a parent.)

I RIGHTARClabel make top of stackhead of first in buffer, move firstin buffer to stack.(pre: Buffer not empty.)

I REDUCE pop the stack(pre: Stack not empty. Top of stack has aparent.)

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

32 / 1

Page 209: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The Arc-Eager Transition System

I SHIFT move first word from bufferto stack.(pre: Buffer not empty.)

I LEFTARClabel make first word inbuffer head of top of stack, popthe stack.(pre: Stack not empty. Top of stack doesnot have a parent.)

I RIGHTARClabel make top of stackhead of first in buffer, move firstin buffer to stack.(pre: Buffer not empty.)

I REDUCE pop the stack(pre: Stack not empty. Top of stack has aparent.)

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

32 / 1

Page 210: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

The Arc-Eager Transition System

I SHIFT move first word from bufferto stack.(pre: Buffer not empty.)

I LEFTARClabel make first word inbuffer head of top of stack, popthe stack.(pre: Stack not empty. Top of stack doesnot have a parent.)

I RIGHTARClabel make top of stackhead of first in buffer, move firstin buffer to stack.(pre: Buffer not empty.)

I REDUCE pop the stack(pre: Stack not empty. Top of stack has aparent.)

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

A A B C D

32 / 1

Page 211: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 212: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 213: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 214: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 215: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 216: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 217: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 218: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 219: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasure

33 / 1

Page 220: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasure

33 / 1

Page 221: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Parsing Example

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

33 / 1

Page 222: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

What do we know about the arc-eager transitionsystem?

I Every sequence of actions result in a valid projectivestructure.

I Every projective tree is derivable by (at least one)sequence of actions.

I Given a tree, finding a sequence of actions for deriving it.("oracle")

we know these things also for thearc-standard, arc-hybrid and other transition systems

34 / 1

Page 223: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 224: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 225: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 226: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 227: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 228: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 229: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 230: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 231: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 232: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 233: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 234: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 235: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 236: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 237: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 238: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 239: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 240: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 241: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 242: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasureA She ate pizza with pleasure

35 / 1

Page 243: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasure

35 / 1

Page 244: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure

A She ate pizza with pleasure

35 / 1

Page 245: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure35 / 1

Page 246: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

“She ate pizza with pleasure”

SH LEFT SH RIGHT RE RIGHT RIGHT RE RE RE

A She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasureA She ate pizza with pleasure

A She ate pizza with pleasure35 / 1

Page 247: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing without an oracle

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

return configuration.tree

36 / 1

Page 248: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing without an oracle

placeholderfor sentence,tree pair in corpus do

start with weight vector w

configuration initialize(sentence)while not configuration.IsFinal() do

action predict(w, �(configuration))configuration configuration.apply(action)

return configuration.tree

36 / 1

Page 249: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing without an oracle

placeholderfor sentence,tree pair in corpus do

start with weight vector w

configuration initialize(sentence)while not configuration.IsFinal() do

action predict(w, �(configuration))configuration configuration.apply(action)

return configuration.tree

36 / 1

summarize the configurationas a feature vector

Page 250: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing without an oracle

placeholderfor sentence,tree pair in corpus do

start with weight vector w

configuration initialize(sentence)while not configuration.IsFinal() do

action predict(w, �(configuration))configuration configuration.apply(action)

return configuration.tree

36 / 1

summarize the configurationas a feature vector

predict the action based on the features

Page 251: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing without an oracle

placeholderfor sentence,tree pair in corpus do

start with weight vector w

configuration initialize(sentence)while not configuration.IsFinal() do

action predict(w, �(configuration))configuration configuration.apply(action)

return configuration.tree

36 / 1

summarize the configurationas a feature vector

predict the action based on the features

need to learn the correct weights

Page 252: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing with an oracle sequence

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()configuration configuration.apply(action)

37 / 1

Page 253: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Learning a parser (batch)

placeholderfor sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()

features �(configuration)training_set.add(features, action)

configuration configuration.apply(action)

37 / 1

Page 254: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Learning a parser (batch)training_set []for sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()features �(configuration)training_set.add(features, action)configuration configuration.apply(action)

train a classifier on training_set

37 / 1

Page 255: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Learning a parser (batch)training_set []for sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()features �(configuration)training_set.add(features, action)configuration configuration.apply(action)

train a classifier on training_set

37 / 1

Page 256: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Learning a parser (online)training_set []for sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()features �(configuration)training_set.add(features, action)configuration configuration.apply(action)

train a classifier on training_set

37 / 1

Page 257: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Learning a parser (online)w 0for sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()features �(configuration)predicted predict(w, �(configuration))if predicted 6= action then

w.update(�(configuration), action, predicted)configuration configuration.apply(action)

return w

37 / 1

Page 258: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Learning a parser (online)w 0for sentence,tree pair in corpus do

sequence oracle(sentence, tree)configuration initialize(sentence)while not configuration.IsFinal() do

action sequence.next()features �(configuration)predicted predict(w, �(configuration))if predicted 6= action then

w.update(�(configuration), action, predicted)configuration configuration.apply(action)

return w

37 / 1

Page 259: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

This knowledge is quite powerful

Parsing timeconfiguration initialize(sentence)while not configuration.isFinal() do

action predict(w, �(configuration))configuration configuration.apply(action)

return configuration.tree

38 / 1

Page 260: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

In short

I Summarize configuration by a set of features.I Learn the best action to take at each configuration.I Hope this generalizes well.

39 / 1

Page 261: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Transition Based Parsing

I A different approach.I Very common.I Can be as accurate as first-order graph-based parsing.

I Higher-order graph-based are still better.I Easy to implement.I Very fast. (O(n))I Can be improved further:

I Easy-firstI Dynamic oracleI Beam Search

41 / 1

Page 262: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Neural Networks

42 / 1

Page 263: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Neural-network (deep learning) based approaches

I Both graph based and transition-based models benefitfrom the move to neural networks.

I Same over-all approach and algorithm as before, but:I Replace classifier from linear to MLP.I Use pre-trained word embeddings.I Replace feature-extractor with Bi-LSTM.

I Now exploring;

I Semi-supervised learning.I Multi-task learning objectives.I Out of domain parsing.

43 / 1

Page 264: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Neural-network (deep learning) based approaches

I Both graph based and transition-based models benefitfrom the move to neural networks.

I Same over-all approach and algorithm as before, but:I Replace classifier from linear to MLP.I Use pre-trained word embeddings.I Replace feature-extractor with Bi-LSTM.

I Now exploring;I Semi-supervised learning.I Multi-task learning objectives.

I Out of domain parsing.

43 / 1

Page 265: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Neural-network (deep learning) based approaches

I Both graph based and transition-based models benefitfrom the move to neural networks.

I Same over-all approach and algorithm as before, but:I Replace classifier from linear to MLP.I Use pre-trained word embeddings.I Replace feature-extractor with Bi-LSTM.

I Now exploring;I Semi-supervised learning.I Multi-task learning objectives.I Out of domain parsing.

43 / 1

Page 266: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Neural Networks (deep learning): seq2seq to linearized trees.

Use the "sequence to sequence with attention" model used for Machine Translation (details in DL4Seq course).

Treat parsing as a translation from sentence to linearized tree.

(S (NP Adj Noun NP) (VP Vb (NP Det Noun NP) VP) S)

Linearize Tree

Page 267: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Neural Networks (deep learning): seq2seq to linearized trees.

Use the "sequence to sequence with attention" model used for Machine Translation (details in DL4Seq course).

Treat parsing as a translation from sentence to linearized tree.

(S (NP Adj Noun NP) (VP Vb (NP Det Noun NP) VP) S)

FruitFlieslikeabanana

NMT (seq2seq+att)

Page 268: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Hybrid Approaches

44 / 1

Page 269: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Hybrid-approaches

I Different parsers have different strengths.) Combine several parsers.

Stacking

I Run parser A.I Use tree from parser A to add features to parser B.

Voting

I Parse the sentence with k different parsers.I Each parser “votes” on its dependency arcs.I Run first-order graph-parser to find tree with best arcs

according to votes.

45 / 1

Page 270: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Hybrid-approaches

I Different parsers have different strengths.) Combine several parsers.

Stacking

I Run parser A.I Use tree from parser A to add features to parser B.

Voting

I Parse the sentence with k different parsers.I Each parser “votes” on its dependency arcs.I Run first-order graph-parser to find tree with best arcs

according to votes.

45 / 1

Page 271: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Hybrid-approaches

I Different parsers have different strengths.) Combine several parsers.

Stacking

I Run parser A.I Use tree from parser A to add features to parser B.

Voting

I Parse the sentence with k different parsers.I Each parser “votes” on its dependency arcs.I Run first-order graph-parser to find tree with best arcs

according to votes.

45 / 1

Page 272: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Semi-supervised-approachesI We only see very few words (and word-pairs) in training

data.I If we know (eat, carrot) is a good pair, what do we know

about (eat, tomato)?I Nothing, if the pair is not in our training data!) Use unlabeled data.

Cluster FeaturesI Represent words as context vectors.I Define a similarity measure between vectors.I Use a clustering algorithm to cluster the words.I We hope that:

I (eat, drink, devour,. . . ) are in the same cluster.I (tomato, carrot, pizza, . . . ) are in the same cluster.

I Use clusters as additional features to the parser.I This works well (better?) also for POS-tagging, NER.

46 / 1

Page 273: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Semi-supervised-approachesI We only see very few words (and word-pairs) in training

data.I If we know (eat, carrot) is a good pair, what do we know

about (eat, tomato)?I Nothing, if the pair is not in our training data!) Use unlabeled data.

Cluster FeaturesI Represent words as context vectors.I Define a similarity measure between vectors.I Use a clustering algorithm to cluster the words.I We hope that:

I (eat, drink, devour,. . . ) are in the same cluster.I (tomato, carrot, pizza, . . . ) are in the same cluster.

I Use clusters as additional features to the parser.I This works well (better?) also for POS-tagging, NER.

46 / 1

Page 274: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Semi-supervised-approachesI We only see very few words (and word-pairs) in training

data.I If we know (eat, carrot) is a good pair, what do we know

about (eat, tomato)?I Nothing, if the pair is not in our training data!) Use unlabeled data.

Cluster FeaturesI Represent words as context vectors.I Define a similarity measure between vectors.I Use a clustering algorithm to cluster the words.I We hope that:

I (eat, drink, devour,. . . ) are in the same cluster.I (tomato, carrot, pizza, . . . ) are in the same cluster.

I Use clusters as additional features to the parser.

I This works well (better?) also for POS-tagging, NER.

46 / 1

Page 275: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Semi-supervised-approachesI We only see very few words (and word-pairs) in training

data.I If we know (eat, carrot) is a good pair, what do we know

about (eat, tomato)?I Nothing, if the pair is not in our training data!) Use unlabeled data.

Cluster FeaturesI Represent words as context vectors.I Define a similarity measure between vectors.I Use a clustering algorithm to cluster the words.I We hope that:

I (eat, drink, devour,. . . ) are in the same cluster.I (tomato, carrot, pizza, . . . ) are in the same cluster.

I Use clusters as additional features to the parser.I This works well (better?) also for POS-tagging, NER.

46 / 1

Page 276: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Available SoftwareThere are many parsers available for download, including:Constituency (PCFG)

I Stanford Parser (can produce also dependencies)I Berkeley ParserI Charniak ParserI Collins Parser

Dependency

I RBGParser, TurboParser (graph based)I ZPar (transition+beam)I ClearNLP (many variants)I EasyFirst (my own)I Bist Parser (from BGU lab, biLSTM, graph + transition)I SpaCy (nice API, super fast!!)

47 / 1

Page 277: u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf · Title: parsing-algorithms

Summary

Dependency Parsers

I Conversion from ConstituencyI Graph-basedI Transition-basedI Hybrid / EnsembleI Semi-supervised (cluster features)

48 / 1