97
1 Dependency Parsing COM4513/6513 Natural Language Processing Nikos Aletras [email protected] @nikaletras Computer Science Department Week 5 Spring 2020

Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

1

Dependency ParsingCOM4513/6513 Natural Language Processing

Nikos [email protected]

@nikaletras

Computer Science Department

Week 5Spring 2020

Page 2: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

2

In previous lectures...

Text Classification: Given an instance x (e.g. document),predict a label y ∈ YTasks: sentiment analysis, topic classification, etc.

Algorithm: Logistic Regression

Page 3: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

3

In previous lectures...

Sequence labelling: Given a sequence of wordsx = [x1, ...xN ], predict a sequence of labels y ∈ YN

Tasks: part of speech tagging, named entity recognition, etc.

Algorithms: Hidden Markov Models, Conditional RandomFields

Page 4: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

4

In this lecture...

Model richer linguistic representations: graphs

Dependency parses: Graphs representing syntactic relationsbetween words in a sentence

Two approaches:

Graph-based Dependency ParsingTransition-based Dependency Parsing

Page 5: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

4

In this lecture...

Model richer linguistic representations: graphs

Dependency parses: Graphs representing syntactic relationsbetween words in a sentence

Two approaches:

Graph-based Dependency ParsingTransition-based Dependency Parsing

Page 6: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

5

Dependecny Parsing: Applications

Relation extraction, e.g. identify entity pairs (AM, ArcticMonkeys), (Abbey Road, Beatles), (Different Class, Pulp)with the relation music album by

Question answering

Sentiment analysis

Page 7: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

6

What is a Dependency Parse?

Dependency parse (or tree): Graph representing syntacticrelations between words in a sentence

Nodes (or vertices): Words in a sentence

Edges (or arcs): Syntactic relations between words, e.g. dog

is the subject (nsubj) of likes (list of standard dependencyrelations)

Dependency Parsing: Automatically identify the syntacticrelations between words in a sentence

Page 8: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

6

What is a Dependency Parse?

Dependency parse (or tree): Graph representing syntacticrelations between words in a sentence

Nodes (or vertices): Words in a sentence

Edges (or arcs): Syntactic relations between words, e.g. dog

is the subject (nsubj) of likes (list of standard dependencyrelations)

Dependency Parsing: Automatically identify the syntacticrelations between words in a sentence

Page 9: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

6

What is a Dependency Parse?

Dependency parse (or tree): Graph representing syntacticrelations between words in a sentence

Nodes (or vertices): Words in a sentence

Edges (or arcs): Syntactic relations between words, e.g. dog

is the subject (nsubj) of likes (list of standard dependencyrelations)

Dependency Parsing: Automatically identify the syntacticrelations between words in a sentence

Page 10: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

7

Graph constraints

Connected: every word can be reached from any other wordignoring edge directionality

Acyclic: can’t re-visit the same word on a directed path

Single-Head: every word can have only one head

Page 11: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

7

Graph constraints

Connected: every word can be reached from any other wordignoring edge directionality

Acyclic: can’t re-visit the same word on a directed path

Single-Head: every word can have only one head

Page 12: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

7

Graph constraints

Connected: every word can be reached from any other wordignoring edge directionality

Acyclic: can’t re-visit the same word on a directed path

Single-Head: every word can have only one head

Page 13: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

8

Well-formed Dependency Parse?

Connected?

NO

Acyclic? YES

Single-headed? YES

Solution?

Page 14: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

8

Well-formed Dependency Parse?

Connected? NO

Acyclic? YES

Single-headed? YES

Solution?

Page 15: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

8

Well-formed Dependency Parse?

Connected? NO

Acyclic?

YES

Single-headed? YES

Solution?

Page 16: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

8

Well-formed Dependency Parse?

Connected? NO

Acyclic? YES

Single-headed? YES

Solution?

Page 17: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

8

Well-formed Dependency Parse?

Connected? NO

Acyclic? YES

Single-headed?

YES

Solution?

Page 18: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

8

Well-formed Dependency Parse?

Connected? NO

Acyclic? YES

Single-headed? YES

Solution?

Page 19: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

9

Well-formed Dependency Parse

Add a special root node with edges to any nodes without heads(main verb and punctuation).

Page 20: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

10

Dependency Parsing: Problem setup

Training data is pairs of word sequences (sentences) anddependency trees:

Dtrain = {(x1,G 1x )...(xM ,G M

x )}xm = [x1, ...xN ]

graph Gx = (Vx,Ax)

vertices Vx = {0, 1, ...,N}edges Ax = {(i , j , k)|i , j ∈ V , k ∈ L(labels)}

We want to learn a model to predict the best graph:

Gx = arg maxGx∈Gx

score(Gx, x)

Page 21: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

10

Dependency Parsing: Problem setup

Training data is pairs of word sequences (sentences) anddependency trees:

Dtrain = {(x1,G 1x )...(xM ,G M

x )}xm = [x1, ...xN ]

graph Gx = (Vx,Ax)

vertices Vx = {0, 1, ...,N}edges Ax = {(i , j , k)|i , j ∈ V , k ∈ L(labels)}

We want to learn a model to predict the best graph:

Gx = arg maxGx∈Gx

score(Gx, x)

Page 22: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

11

Learning a dependency parser

We want to learn a model to predict the best graph:

Gx = arg maxGx∈Gx

score(Gx, x)

where the Gx is a well-formed dependency tree.

Can we learn it using what we know so far?

Enumerationover all possible graphs will be expensive.

How about a classifier that predicts each edge? Maybe.But predicting an edge makes some edges invalid due to theacyclic and single-head constraints.

Page 23: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

11

Learning a dependency parser

We want to learn a model to predict the best graph:

Gx = arg maxGx∈Gx

score(Gx, x)

where the Gx is a well-formed dependency tree.

Can we learn it using what we know so far? Enumerationover all possible graphs will be expensive.

How about a classifier that predicts each edge? Maybe.But predicting an edge makes some edges invalid due to theacyclic and single-head constraints.

Page 24: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

11

Learning a dependency parser

We want to learn a model to predict the best graph:

Gx = arg maxGx∈Gx

score(Gx, x)

where the Gx is a well-formed dependency tree.

Can we learn it using what we know so far? Enumerationover all possible graphs will be expensive.

How about a classifier that predicts each edge?

Maybe.But predicting an edge makes some edges invalid due to theacyclic and single-head constraints.

Page 25: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

11

Learning a dependency parser

We want to learn a model to predict the best graph:

Gx = arg maxGx∈Gx

score(Gx, x)

where the Gx is a well-formed dependency tree.

Can we learn it using what we know so far? Enumerationover all possible graphs will be expensive.

How about a classifier that predicts each edge? Maybe.But predicting an edge makes some edges invalid due to theacyclic and single-head constraints.

Page 26: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

12

Maximum Spanning Tree

Spanning Tree: In graph theory, a spanning tree T of anundirected graph G is a subgraph that includes all of thevertices of G, with the minimum possible number of edges.

Tree: In computer science, a tree is a widely used datastructure (Abstract Data Type) that simulates a hierarchicaltree structure, with a root value and subtrees of children witha parent node, represented as a set of linked nodes.

Page 27: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

12

Maximum Spanning Tree

Spanning Tree: In graph theory, a spanning tree T of anundirected graph G is a subgraph that includes all of thevertices of G, with the minimum possible number of edges.

Tree: In computer science, a tree is a widely used datastructure (Abstract Data Type) that simulates a hierarchicaltree structure, with a root value and subtrees of children witha parent node, represented as a set of linked nodes.

Page 28: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

13

Maximum Spanning Tree

Score all edges, but keep only the max spanning tree usingChu-Liu-Edmonds algorithm, a modification to Kruskal’s algorithmfor extracting Maximum Spanning Trees.

Exact solution in O(N2) time using Chu-Liu-Edmonds.

Page 29: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

14

Kruskal’s algorithm

Input scored edges E

sort E by cost (opposit of score)

G = {}while G not spanning do:

pop the next edge e

if connecting different trees :

add e to G

Return G

Page 30: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

15

Graph-based Dependency Parsing

Decompose the graph score into arc scores:

Gx = arg maxGx∈Gx

score(Gx, x)

= arg maxGx∈Gx

w · Φ(Gx, x) (linear model)

= arg maxGx∈Gx

∑(i ,j ,l)∈Ax

w · φ((i , j , l), x) (arc-factored)

Can learn the weights with a Conditional Random Field!

Page 31: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

16

Feature Representation

What should φ((head , dependent, label), x) be?

unigram: head=had, head=VERB

bigram: head=had & dependent=effect

head=VERB & dependent=NOUN & between=ADJ

head=had & label=dobj & other-label=nsubj

NO!!! Breaks the arc-factored scoring

Page 32: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

16

Feature Representation

What should φ((head , dependent, label), x) be?

unigram: head=had, head=VERB

bigram: head=had & dependent=effect

head=VERB & dependent=NOUN & between=ADJ

head=had & label=dobj & other-label=nsubj

NO!!! Breaks the arc-factored scoring

Page 33: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

16

Feature Representation

What should φ((head , dependent, label), x) be?

unigram: head=had, head=VERB

bigram: head=had & dependent=effect

head=VERB & dependent=NOUN & between=ADJ

head=had & label=dobj & other-label=nsubj

NO!!! Breaks the arc-factored scoring

Page 34: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

16

Feature Representation

What should φ((head , dependent, label), x) be?

unigram: head=had, head=VERB

bigram: head=had & dependent=effect

head=VERB & dependent=NOUN & between=ADJ

head=had & label=dobj & other-label=nsubj

NO!!! Breaks the arc-factored scoring

Page 35: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

16

Feature Representation

What should φ((head , dependent, label), x) be?

unigram: head=had, head=VERB

bigram: head=had & dependent=effect

head=VERB & dependent=NOUN & between=ADJ

head=had & label=dobj & other-label=nsubjNO!!! Breaks the arc-factored scoring

Page 36: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

17

More Global models

Even though inference and learning are global, features arelocalised to arcs.

Can we have more global features?

Yes we can! Considersubgraphs spanning a few edges. But inference becomesharder, requiring more complex dynamic programs and cleverapproximations.

Is it worth it? Syntactic parsing has many applications, thusbetter compromises between speed and accuracy are alwayswelcome!

Page 37: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

17

More Global models

Even though inference and learning are global, features arelocalised to arcs.

Can we have more global features? Yes we can! Considersubgraphs spanning a few edges. But inference becomesharder, requiring more complex dynamic programs and cleverapproximations.

Is it worth it? Syntactic parsing has many applications, thusbetter compromises between speed and accuracy are alwayswelcome!

Page 38: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

18

Transition-based Dependency Parsing

Graph-based dependency parsing restricts the features toperform joint inference efficiently.

Transition-based dependency parsing trades joint inferencefor feature flexibility.

No more argmax over graphs, just use a classifier with anyfeatures we want!

Page 39: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

18

Transition-based Dependency Parsing

Graph-based dependency parsing restricts the features toperform joint inference efficiently.

Transition-based dependency parsing trades joint inferencefor feature flexibility.

No more argmax over graphs, just use a classifier with anyfeatures we want!

Page 40: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

18

Transition-based Dependency Parsing

Graph-based dependency parsing restricts the features toperform joint inference efficiently.

Transition-based dependency parsing trades joint inferencefor feature flexibility.

No more argmax over graphs, just use a classifier with anyfeatures we want!

Page 41: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

19

Joint vs incremental prediction

Joint: score (and enumerate) complete outputs (graphs)

Page 42: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

20

Joint vs incremental prediction

Incremental: predict a sequenceof actions (transitions)constructing the output

Page 43: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

21

Transition system

The actions A the classifier f can predict and their effect on thestate which tracks the prediction: St+1 = S1(α1 . . . αt)

What should the actions (transitions) be for dependency parsing?

Page 44: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

21

Transition system

The actions A the classifier f can predict and their effect on thestate which tracks the prediction: St+1 = S1(α1 . . . αt)

What should the actions (transitions) be for dependency parsing?

Page 45: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

22

Transition system setup

Input: Vertices Vx = {0, 1, ...,N} (words sentence x)

State S = (Stack,B,A):

Arcs A (dependencies predicted so far)Buffer Buf (words left to process)Stack Stack (last-in, first out memory)

Initial state: S0 = ([], [0, 1, ...,N], {})Final state: Sfinal = (Stack , [],A)

Page 46: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

23

Transition system

Shift (Stack , i |Buf ,A)→ (Stack|i ,Buf ,A): push next word fromthe buffer (i) to stack

Reduce (Stack|i ,Buf ,A)→ (Stack,Buf ,A): pop word top of thestack (i) if it has a head

Right-Arc(label) (Stack|i , j |Buf ,A)→(Stack |i |j ,Buf ,A ∪ {(i , j , l)}): create edge (i , j , label) between topof the stack (i) and next in buffer (j), push j

Left-Arc(label) (Stack|i , j |Buf ,A)→ (Stack, j |Buf ,A ∪ {(j , i , l)}):create edge (j , i , label) and pop i , if i has no head

Page 47: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

23

Transition system

Shift (Stack , i |Buf ,A)→ (Stack|i ,Buf ,A): push next word fromthe buffer (i) to stack

Reduce (Stack|i ,Buf ,A)→ (Stack ,Buf ,A): pop word top of thestack (i) if it has a head

Right-Arc(label) (Stack|i , j |Buf ,A)→(Stack |i |j ,Buf ,A ∪ {(i , j , l)}): create edge (i , j , label) between topof the stack (i) and next in buffer (j), push j

Left-Arc(label) (Stack|i , j |Buf ,A)→ (Stack, j |Buf ,A ∪ {(j , i , l)}):create edge (j , i , label) and pop i , if i has no head

Page 48: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

23

Transition system

Shift (Stack , i |Buf ,A)→ (Stack|i ,Buf ,A): push next word fromthe buffer (i) to stack

Reduce (Stack|i ,Buf ,A)→ (Stack ,Buf ,A): pop word top of thestack (i) if it has a head

Right-Arc(label) (Stack|i , j |Buf ,A)→(Stack |i |j ,Buf ,A ∪ {(i , j , l)}): create edge (i , j , label) between topof the stack (i) and next in buffer (j), push j

Left-Arc(label) (Stack|i , j |Buf ,A)→ (Stack, j |Buf ,A ∪ {(j , i , l)}):create edge (j , i , label) and pop i , if i has no head

Page 49: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

23

Transition system

Shift (Stack , i |Buf ,A)→ (Stack|i ,Buf ,A): push next word fromthe buffer (i) to stack

Reduce (Stack|i ,Buf ,A)→ (Stack ,Buf ,A): pop word top of thestack (i) if it has a head

Right-Arc(label) (Stack|i , j |Buf ,A)→(Stack |i |j ,Buf ,A ∪ {(i , j , l)}): create edge (i , j , label) between topof the stack (i) and next in buffer (j), push j

Left-Arc(label) (Stack |i , j |Buf ,A)→ (Stack, j |Buf ,A ∪ {(j , i , l)}):create edge (j , i , label) and pop i , if i has no head

Page 50: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

24

Example

Stack = []Buffer = [ROOT, Economic, news, had, little, effect, on, financial,markets, .]

Action?

Shift

Page 51: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

24

Example

Stack = []Buffer = [ROOT, Economic, news, had, little, effect, on, financial,markets, .]

Action? Shift

Page 52: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

25

Example

Stack = [ROOT]Buffer = [Economic, news, had, little, effect, on, financial,markets, .]

Action?

Shift

Page 53: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

25

Example

Stack = [ROOT]Buffer = [Economic, news, had, little, effect, on, financial,markets, .]

Action? Shift

Page 54: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

26

Example

Stack = [ROOT, Economic]Buffer = [news, had, little, effect, on, financial, markets, .]

Action?

Left-Arc(amod)

Page 55: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

26

Example

Stack = [ROOT, Economic]Buffer = [news, had, little, effect, on, financial, markets, .]

Action? Left-Arc(amod)

Page 56: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

27

Example

Stack = [ROOT]Buffer = [news, had, little, effect, on, financial, markets, .]

Action?

Shift

Page 57: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

27

Example

Stack = [ROOT]Buffer = [news, had, little, effect, on, financial, markets, .]

Action? Shift

Page 58: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

28

Example

Stack = [ROOT, news]Buffer = [had, little, effect, on, financial, markets, .]

Action?

Left-Arc(nsubj)

Page 59: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

28

Example

Stack = [ROOT, news]Buffer = [had, little, effect, on, financial, markets, .]

Action? Left-Arc(nsubj)

Page 60: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

29

Example

Stack = [ROOT]Buffer = [had, little, effect, on, financial, markets, .]

Action?

Right-Arc(root)

Page 61: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

29

Example

Stack = [ROOT]Buffer = [had, little, effect, on, financial, markets, .]

Action? Right-Arc(root)

Page 62: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

30

Example

Stack = [ROOT, had]Buffer = [little, effect, on, financial, markets, .]

Action?

Shift

Page 63: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

30

Example

Stack = [ROOT, had]Buffer = [little, effect, on, financial, markets, .]

Action? Shift

Page 64: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

31

Example

Stack = [ROOT, had, little]Buffer = [effect, on, financial, markets, .]

Action?

Left-Arc(amod)

Page 65: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

31

Example

Stack = [ROOT, had, little]Buffer = [effect, on, financial, markets, .]

Action? Left-Arc(amod)

Page 66: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

32

Example

Stack = [ROOT, had]Buffer = [effect, on, financial, markets, .]

Action?

Right-Arc(dobj)

Page 67: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

32

Example

Stack = [ROOT, had]Buffer = [effect, on, financial, markets, .]

Action? Right-Arc(dobj)

Page 68: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

33

Example

Stack = [ROOT, had, effect]Buffer = [on, financial, markets, .]

Action?

let’s fast-forward...

Page 69: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

33

Example

Stack = [ROOT, had, effect]Buffer = [on, financial, markets, .]

Action? let’s fast-forward...

Page 70: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

34

Example

Stack = [ROOT, had, .]Buffer = []

Empty buffer.

DONE!

Page 71: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

34

Example

Stack = [ROOT, had, .]Buffer = []

Empty buffer. DONE!

Page 72: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

35

Other transition systems?

This was the arc-eager system. Others:

arc-standard (3 actions)easy-first (not left-to-right), etc.

All operate with actions combining:

moving words from the buffer to the stack and back(shift/un-shift)popping words from the stack (reduce)creating labeled arcs left and right

Intuition: Define actions that are easy to learn

Page 73: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

35

Other transition systems?

This was the arc-eager system. Others:

arc-standard (3 actions)

easy-first (not left-to-right), etc.

All operate with actions combining:

moving words from the buffer to the stack and back(shift/un-shift)popping words from the stack (reduce)creating labeled arcs left and right

Intuition: Define actions that are easy to learn

Page 74: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

35

Other transition systems?

This was the arc-eager system. Others:

arc-standard (3 actions)easy-first (not left-to-right), etc.

All operate with actions combining:

moving words from the buffer to the stack and back(shift/un-shift)popping words from the stack (reduce)creating labeled arcs left and right

Intuition: Define actions that are easy to learn

Page 75: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

35

Other transition systems?

This was the arc-eager system. Others:

arc-standard (3 actions)easy-first (not left-to-right), etc.

All operate with actions combining:

moving words from the buffer to the stack and back(shift/un-shift)popping words from the stack (reduce)creating labeled arcs left and right

Intuition: Define actions that are easy to learn

Page 76: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

36

Transition-based Dependency Parsing

Input: sentence x

state S1 = initialize(x); timestep t = 1

while St not final do

action αt = arg maxα∈A

f (α,St)

St+1 = St(αt); t = t + 1

What is f ?

A multiclass classifierWhat do we need to learn it?

learning algorithm (e.g. logistic regression)

labelled training data

feature representation

Page 77: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

36

Transition-based Dependency Parsing

Input: sentence x

state S1 = initialize(x); timestep t = 1

while St not final do

action αt = arg maxα∈A

f (α,St)

St+1 = St(αt); t = t + 1

What is f ? A multiclass classifierWhat do we need to learn it?

learning algorithm (e.g. logistic regression)

labelled training data

feature representation

Page 78: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

36

Transition-based Dependency Parsing

Input: sentence x

state S1 = initialize(x); timestep t = 1

while St not final do

action αt = arg maxα∈A

f (α,St)

St+1 = St(αt); t = t + 1

What is f ? A multiclass classifierWhat do we need to learn it?

learning algorithm (e.g. logistic regression)

labelled training data

feature representation

Page 79: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

37

What are the right actions?

We only have sentences labelledwith graphs:Dtrain = {(x1,G 1

x )...(xM ,G Mx )}

Ask an oracle to tell us theactions constructing the graph!

In our case, a set of rules comparing the current stateS = (Stack ,Buffer ,ArcsPredicted) with Gx returning the correctaction as label

Page 80: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

37

What are the right actions?

We only have sentences labelledwith graphs:Dtrain = {(x1,G 1

x )...(xM ,G Mx )}

Ask an oracle to tell us theactions constructing the graph!

In our case, a set of rules comparing the current stateS = (Stack,Buffer ,ArcsPredicted) with Gx returning the correctaction as label

Page 81: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

37

What are the right actions?

We only have sentences labelledwith graphs:Dtrain = {(x1,G 1

x )...(xM ,G Mx )}

Ask an oracle to tell us theactions constructing the graph!

In our case, a set of rules comparing the current stateS = (Stack ,Buffer ,ArcsPredicted) with Gx returning the correctaction as label

Page 82: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

38

Learning from an oracle

Given a labelled sentence and a transition system, an oracle returnsstates labelled with the correct actions.

Dtrain = {(x1,G 1x )...(xM ,G M

x )}xm = [x1, ..., xN ]

graph Gx = (Vx,Ax)

vertices Vx = {0, 1, ...,N}edges Ax = {(i , j , k)|i , j ∈ V , k ∈ L(labels)}

states Sm= [S1, ...,ST ]

actions αm= [α1, ..., αT ]

Page 83: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

39

Feature Representation

Stack = [ROOT, had, effect]Buffer = [on, financial, markets, .]

What features would help us predict the correction actionRight-Arc(prep)?

Page 84: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

40

Feature Representation

Words/PoS in stack and buffer:wordS1=effect, wordB1=on, wordS2=had, posS1=NOUN,etc.

Dependencies so far:depS1=dobj, depLeftChildS1=amod,depRightChildS1=NULL, etc.

Previous actions:αt−1 = Right-Arc(dobj), etc.

Page 85: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

40

Feature Representation

Words/PoS in stack and buffer:wordS1=effect, wordB1=on, wordS2=had, posS1=NOUN,etc.

Dependencies so far:depS1=dobj, depLeftChildS1=amod,depRightChildS1=NULL, etc.

Previous actions:αt−1 = Right-Arc(dobj), etc.

Page 86: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

40

Feature Representation

Words/PoS in stack and buffer:wordS1=effect, wordB1=on, wordS2=had, posS1=NOUN,etc.

Dependencies so far:depS1=dobj, depLeftChildS1=amod,depRightChildS1=NULL, etc.

Previous actions:αt−1 = Right-Arc(dobj), etc.

Page 87: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

41

Transition-based vs Graph-based parsing

Transition-based tends to be better on shorter sentences,graph-based on longer ones

Graph-based tends to be better on long-range dependencies

Graph-based lacks the rich structural features

Transition-based is greedy and suffers from early mistakes

Actually, can we ameliorate the greedy issue?

Use Beam Search!

Page 88: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

41

Transition-based vs Graph-based parsing

Transition-based tends to be better on shorter sentences,graph-based on longer ones

Graph-based tends to be better on long-range dependencies

Graph-based lacks the rich structural features

Transition-based is greedy and suffers from early mistakes

Actually, can we ameliorate the greedy issue?Use Beam Search!

Page 89: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

41

Transition-based vs Graph-based parsing

Transition-based tends to be better on shorter sentences,graph-based on longer ones

Graph-based tends to be better on long-range dependencies

Graph-based lacks the rich structural features

Transition-based is greedy and suffers from early mistakes

Actually, can we ameliorate the greedy issue?Use Beam Search!

Page 90: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

41

Transition-based vs Graph-based parsing

Transition-based tends to be better on shorter sentences,graph-based on longer ones

Graph-based tends to be better on long-range dependencies

Graph-based lacks the rich structural features

Transition-based is greedy and suffers from early mistakes

Actually, can we ameliorate the greedy issue?Use Beam Search!

Page 91: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

41

Transition-based vs Graph-based parsing

Transition-based tends to be better on shorter sentences,graph-based on longer ones

Graph-based tends to be better on long-range dependencies

Graph-based lacks the rich structural features

Transition-based is greedy and suffers from early mistakes

Actually, can we ameliorate the greedy issue?Use Beam Search!

Page 92: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

42

Non-Projectivity

Arcs are crossing each other

long-range dependencies

free word order

Page 93: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

43

Non-projective Transition-based parsing

The standard stack-based systems cannot do it.

But there are extensions:

swap actions: word reoderingk-planar parsing: use multiple stacks (usually 2)

Standard graph-based parsing handles non-projectivity.

Page 94: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

44

Incremental Language Processing

Other problems solved with similar approaches (a.k.a.transition-based, greedy):

semantic parsing (converting a natural language utterance to alogical form)coreference resolution

Whenever you have a problem with a very large space ofoutputs, worth considering

Page 95: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

45

Evaluation

Head-finding word-accuracy:unlabelled: % of words with the right headlabelled: % of words with the right head and label

Sentence accuracy: % of sentences with correct graph

Page 96: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

46

Bibliography

Chapter 11 from Eisenstein

Nivre and McDonald’s tutorial slides

Nivre’s article on deterministic transition-based dependencyparsing

Nivre and McDonald’s paper comparing their approaches

Page 97: Dependency Parsing - COM4513/6513 Natural Language Processing · Relation extraction, e.g. identify entity pairs (AM, Arctic Monkeys), (Abbey Road, Beatles), (Di erent Class, Pulp)

47

Coming up next week...

Feed-forward Neural Networks

Getting ready for Assignment 2!