43
Dependency Parsing Jinho D. Choi University of Colorado Preliminary Exam March 4, 2009

Dependency Parsing

Embed Size (px)

Citation preview

Dependency Parsing

Jinho D. ChoiUniversity of Colorado

Preliminary ExamMarch 4, 2009

Contents• Dependency Structure

- What is dependency structure?

- Phrase structure vs. Dependency structure

- Dependency Graph

• Dependency Parsers

- MaltParser: Nivre’s algorithm

- MSTParser: Edmonds’s algorithm

- MaltParser vs. MSTParser

- Choi’s algorithm

• Applications

Dependency Structure

• What is dependency?

- Syntactic or semantic relation between lexicons

- Syntactic: NMOD, AMOD, Semantic: LOC, MNR

• Phrase Structure(PS) vs. Dependency Structure(DS)

- Constituents vs. Dependencies

- There are no phrasal nodes in DS.

!Each node in DS represents a word-token.

- In DS, every node except the root is dependent in exactly one other node.

Phrase vs. Dependency

S

NP

Pro

she

VP

V

bought

NP

Det N

a car

Phrase Structure

She bought a car

Dependency Structure

a

carshe

bought

SBJ OBJ

DET

• Not flexible with word-orders

• Language dependent

• No semantic information

She

bought

car

a

SBJ OBJ

NMOD

Jinho

Root

Dependency Graph• For a sentence x = w1..wn, a dependency graph Gx = (Vx, Ex)

- Vx = {w0 = root, w1, ... , wn},

- Ex = {(wi, r, wj) : wi " wj, wi ! Vx, wj ! Vx - w0, r ! Rx}

!Rx = a set of all possible dependency relations in x

• Well-formed Dependency Graph

- Unique root

- Single head

- Connected

- Acyclic

Projectivity vs Non-projectivity• Projectivity means no cross-edges.

• Why projectivity?

- Regenerate the original sentence with the same word-orders

- Parsing is less expressive (O(n) vs. O(n2))

- There are not many non-projective relations

She bought cararoot

She bought cararoot yesterday that was blue

Dependency Parsers• Two state-of-art dependency parsers

- MaltParser: performed the best in CoNLL 2007 shared task

- MSTParser: performed the best in CoNLL 2006 shared task

• MaltParser

- Developed by Johan Hall, Jens Nilsson, and Joakim Nivre

- Nivre’s algorithm(p, O(n)), Covington’s algorithm(n, O(n2))

• MSTParser

- Developed by Ryan McDonald

- Eisner’s algorithm(p,O(k log k)), Edmonds’s algorithm(n, O(kn2)

Nivre’s Algorithm

• Based on Shift-Reduce algorithm

• S = a stack

• I = a list of remaining input tokens

she bought a car

Nivre’s Algorithm

she bought a car

S I A

Nivre’s Algorithm

she bought a car

S I A

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A

• Shift : ‘she’

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A

• Shift : ‘she’

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

she

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize • Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize • Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

a

car

S

she ! bought

I

a ! car

A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought car

S

she ! bought

I

a ! car

A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought car

S

she ! bought

I

a ! car

A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Right-Arc : ‘bought " car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought car

S

she ! bought

I

a ! car

bought " car

A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Right-Arc : ‘bought " car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

car

S

she ! bought

I

a ! car

bought " car

A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Right-Arc : ‘bought " car’

• Shift : ‘a’

Nivre’s Algorithm

she bought a car

bought

car

S

she ! bought

I

a ! car

bought " car

A

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’

• Initialize

• Left-Arc : ‘a ! car’

• Right-Arc : ‘bought " car’

• Shift : ‘a’

• Terminate (no need to reduce ‘car’ or ‘bought’)

Edmonds’s Algorithm• Based on Maximum Spanning Tree algorithm

• Algorithm

1. Build a complete graph

2. Keep only incoming edges with the maximum scores

3. If there is no cycle, goto #5

4. If there is a cycle, pretend the cycle as one vertex and update scores for all incoming edges to the cycle; goto #2

5. Break all cycles by removing appropriate edges in the cycle (edges that cause multiple heads)

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

saw

John Mary

root

29

409

3

3030

31

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

saw

John Mary

root40

30

saw

John Mary

root

29

409

3

3030

31

Edmonds’s Algorithm

saw

John Mary

root

9

10 9

20

3

030 30

11

saw

John Mary

root

2030 30

saw

John Mary

root40

30

saw

John Mary

root10

30 30

saw

John Mary

root

29

409

3

3030

31

MaltParser vs. MSTParser• Advantages

- MaltParser: low complexity, more accurate for short-distance

- MSTParser: high accuracy, more accurate for long-distance

• Merge MaltParser and MSTParser in learning stages

Choi’s Algorithm• Projective dependency parsing algorithm

- Motivation: do more exhaustive searches than MaltParser but keep the complexity lower than the one for MSTParser

- Intuition: in projective dependency graph, every word can find its head from a word in adjacent phrases

- Searching: starts with the edge-node, jump to its head

- Complexity: O(k"n), k is the number of words in each phrase

She bought cara yesterday that was blue

Choi’s Algorithm

A B C D E

0.60.9

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

A B C D E

0.9

0.7

X

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

A B C D E

0.9

0.7

X

A B C D E

0.9

0.7

XX

0.8

Choi’s Algorithm

A B C D E

0.60.9

X

A B C D E

0.9

X

A B C D E

0.9

0.50.7

X

A B C D E

0.9

0.7

X

A B C D E

0.9

0.7

XX

0.8

A B C D E

0.9

0.7

0.8

0.5

0.8

Applications• Semantic Role Labeling

- CoNLL 2008~9 shared task

• Sentence Compression

- Relation extraction

• Sentence Alignment

- Paraphrase detection, machine translation

• Sentiment Analysis