Dependency Parsing

Dependency Parsing

Jinho D. ChoiUniversity of Colorado

Preliminary ExamMarch 4, 2009

Contents• Dependency Structure

- What is dependency structure?

- Phrase structure vs. Dependency structure

- Dependency Graph

• Dependency Parsers

- MaltParser: Nivre’s algorithm

- MSTParser: Edmonds’s algorithm

- MaltParser vs. MSTParser

- Choi’s algorithm

• Applications

Dependency Structure

• What is dependency?

- Syntactic or semantic relation between lexicons

- Syntactic: NMOD, AMOD, Semantic: LOC, MNR

• Phrase Structure(PS) vs. Dependency Structure(DS)

- Constituents vs. Dependencies

- There are no phrasal nodes in DS.

!Each node in DS represents a word-token.

- In DS, every node except the root is dependent in exactly one other node.

Phrase vs. Dependency

S

NP

Pro

she

VP

V

bought

NP

Det N

a car

Phrase Structure

She bought a car

Dependency Structure

a

carshe

bought

SBJ OBJ

DET

• Not flexible with word-orders

• Language dependent

• No semantic information

She

bought

car

a

SBJ OBJ

NMOD

Jinho

Root

Dependency Graph• For a sentence x = w1..wn, a dependency graph Gx = (Vx, Ex)

- Vx = {w0 = root, w1, ... , wn},

- Ex = {(wi, r, wj) : wi " wj, wi ! Vx, wj ! Vx - w0, r ! Rx}

!Rx = a set of all possible dependency relations in x

• Well-formed Dependency Graph

- Unique root

- Single head

- Connected

- Acyclic

Projectivity vs Non-projectivity• Projectivity means no cross-edges.

• Why projectivity?

- Regenerate the original sentence with the same word-orders

- Parsing is less expressive (O(n) vs. O(n2))

- There are not many non-projective relations

She bought cararoot

She bought cararoot yesterday that was blue

Dependency Parsers• Two state-of-art dependency parsers

- MaltParser: performed the best in CoNLL 2007 shared task

- MSTParser: performed the best in CoNLL 2006 shared task

• MaltParser

- Developed by Johan Hall, Jens Nilsson, and Joakim Nivre

- Nivre’s algorithm(p, O(n)), Covington’s algorithm(n, O(n2))

• MSTParser

- Developed by Ryan McDonald

- Eisner’s algorithm(p,O(k log k)), Edmonds’s algorithm(n, O(kn2)

Nivre’s Algorithm

• Based on Shift-Reduce algorithm

• S = a stack

• I = a list of remaining input tokens

she bought a car

Nivre’s Algorithm

she bought a car

S I A

Nivre’s Algorithm

she bought a car

S I A

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A

• Shift : ‘she’

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A


• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A


• Left-Arc : ‘she ! bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S

she ! bought

I A



• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A



• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A



• Shift : ‘bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A




• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A




• Initialize • Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A




• Initialize • Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A




• Initialize

• Left-Arc : ‘a ! car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I

a ! car

A




• Initialize


• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought car

S

she ! bought

I

a ! car

A




• Initialize


• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought car

S

she ! bought

I

a ! car

A




• Initialize


• Right-Arc : ‘bought " car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought car

S

she ! bought

I

a ! car

bought " car

A




• Initialize



• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

car

S

she ! bought

I

a ! car

bought " car

A




• Initialize



• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

car

S

she ! bought

I

a ! car

bought " car

A




• Initialize



• Shift : ‘a’

• Terminate (no need to reduce ‘car’ or ‘bought’)

Edmonds’s Algorithm• Based on Maximum Spanning Tree algorithm

• Algorithm

1. Build a complete graph

2. Keep only incoming edges with the maximum scores

3. If there is no cycle, goto #5

4. If there is a cycle, pretend the cycle as one vertex and update scores for all incoming edges to the cycle; goto #2

5. Break all cycles by removing appropriate edges in the cycle (edges that cause multiple heads)

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11


saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30


saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30


saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

saw

John Mary

root

29

409

3

3030

31


saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

saw

John Mary

root40

30

saw

John Mary

root

29

409

3

3030

31


saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

saw

John Mary

root40

30

saw

John Mary

root10

30 30

saw

John Mary

root

29

409

3

3030

31

MaltParser vs. MSTParser• Advantages

- MaltParser: low complexity, more accurate for short-distance

- MSTParser: high accuracy, more accurate for long-distance

• Merge MaltParser and MSTParser in learning stages

Choi’s Algorithm• Projective dependency parsing algorithm

- Motivation: do more exhaustive searches than MaltParser but keep the complexity lower than the one for MSTParser

- Intuition: in projective dependency graph, every word can find its head from a word in adjacent phrases

- Searching: starts with the edge-node, jump to its head

- Complexity: O(k"n), k is the number of words in each phrase

She bought cara yesterday that was blue

Choi’s Algorithm

A B C D E

0.60.9

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

A B C D E

0.9

0.7

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

A B C D E

0.9

0.7

X

A B C D E

0.9

0.7

XX

0.8

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

A B C D E

0.9

0.7

X

A B C D E

0.9

0.7

XX

0.8

A B C D E

0.9

0.7

0.8

0.5

0.8

Applications• Semantic Role Labeling

- CoNLL 2008~9 shared task

• Sentence Compression

- Relation extraction

• Sentence Alignment

- Paraphrase detection, machine translation

• Sentiment Analysis

Technology

Dependency Parsing