Upload
jinho-d-choi
View
2.672
Download
0
Embed Size (px)
Citation preview
Contents• Dependency Structure
- What is dependency structure?
- Phrase structure vs. Dependency structure
- Dependency Graph
• Dependency Parsers
- MaltParser: Nivre’s algorithm
- MSTParser: Edmonds’s algorithm
- MaltParser vs. MSTParser
- Choi’s algorithm
• Applications
Dependency Structure
• What is dependency?
- Syntactic or semantic relation between lexicons
- Syntactic: NMOD, AMOD, Semantic: LOC, MNR
• Phrase Structure(PS) vs. Dependency Structure(DS)
- Constituents vs. Dependencies
- There are no phrasal nodes in DS.
!Each node in DS represents a word-token.
- In DS, every node except the root is dependent in exactly one other node.
Phrase vs. Dependency
S
NP
Pro
she
VP
V
bought
NP
Det N
a car
Phrase Structure
She bought a car
Dependency Structure
a
carshe
bought
SBJ OBJ
DET
• Not flexible with word-orders
• Language dependent
• No semantic information
She
bought
car
a
SBJ OBJ
NMOD
Jinho
Root
Dependency Graph• For a sentence x = w1..wn, a dependency graph Gx = (Vx, Ex)
- Vx = {w0 = root, w1, ... , wn},
- Ex = {(wi, r, wj) : wi " wj, wi ! Vx, wj ! Vx - w0, r ! Rx}
!Rx = a set of all possible dependency relations in x
• Well-formed Dependency Graph
- Unique root
- Single head
- Connected
- Acyclic
Projectivity vs Non-projectivity• Projectivity means no cross-edges.
• Why projectivity?
- Regenerate the original sentence with the same word-orders
- Parsing is less expressive (O(n) vs. O(n2))
- There are not many non-projective relations
She bought cararoot
She bought cararoot yesterday that was blue
Dependency Parsers• Two state-of-art dependency parsers
- MaltParser: performed the best in CoNLL 2007 shared task
- MSTParser: performed the best in CoNLL 2006 shared task
• MaltParser
- Developed by Johan Hall, Jens Nilsson, and Joakim Nivre
- Nivre’s algorithm(p, O(n)), Covington’s algorithm(n, O(n2))
• MSTParser
- Developed by Ryan McDonald
- Eisner’s algorithm(p,O(k log k)), Edmonds’s algorithm(n, O(kn2)
Nivre’s Algorithm
• Based on Shift-Reduce algorithm
• S = a stack
• I = a list of remaining input tokens
she bought a car
Nivre’s Algorithm
she bought a car
bought
she
a
car
S I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Initialize
Nivre’s Algorithm
she bought a car
bought
she
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Initialize
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Initialize
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize • Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize • Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought
a
car
S
she ! bought
I
a ! car
A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought car
S
she ! bought
I
a ! car
A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought car
S
she ! bought
I
a ! car
A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Right-Arc : ‘bought " car’
• Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought car
S
she ! bought
I
a ! car
bought " car
A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Right-Arc : ‘bought " car’
• Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought
car
S
she ! bought
I
a ! car
bought " car
A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Right-Arc : ‘bought " car’
• Shift : ‘a’
Nivre’s Algorithm
she bought a car
bought
car
S
she ! bought
I
a ! car
bought " car
A
• Shift : ‘she’
• Left-Arc : ‘she ! bought’
• Shift : ‘bought’
• Initialize
• Left-Arc : ‘a ! car’
• Right-Arc : ‘bought " car’
• Shift : ‘a’
• Terminate (no need to reduce ‘car’ or ‘bought’)
Edmonds’s Algorithm• Based on Maximum Spanning Tree algorithm
• Algorithm
1. Build a complete graph
2. Keep only incoming edges with the maximum scores
3. If there is no cycle, goto #5
4. If there is a cycle, pretend the cycle as one vertex and update scores for all incoming edges to the cycle; goto #2
5. Break all cycles by removing appropriate edges in the cycle (edges that cause multiple heads)
Edmonds’s Algorithm
saw
John Mary
root
9
10 9
20
3
030 30
11
saw
John Mary
root
2030 30
saw
John Mary
root
29
409
3
3030
31
Edmonds’s Algorithm
saw
John Mary
root
9
10 9
20
3
030 30
11
saw
John Mary
root
2030 30
saw
John Mary
root40
30
saw
John Mary
root
29
409
3
3030
31
Edmonds’s Algorithm
saw
John Mary
root
9
10 9
20
3
030 30
11
saw
John Mary
root
2030 30
saw
John Mary
root40
30
saw
John Mary
root10
30 30
saw
John Mary
root
29
409
3
3030
31
MaltParser vs. MSTParser• Advantages
- MaltParser: low complexity, more accurate for short-distance
- MSTParser: high accuracy, more accurate for long-distance
• Merge MaltParser and MSTParser in learning stages
Choi’s Algorithm• Projective dependency parsing algorithm
- Motivation: do more exhaustive searches than MaltParser but keep the complexity lower than the one for MSTParser
- Intuition: in projective dependency graph, every word can find its head from a word in adjacent phrases
- Searching: starts with the edge-node, jump to its head
- Complexity: O(k"n), k is the number of words in each phrase
She bought cara yesterday that was blue
Choi’s Algorithm
A B C D E
0.60.9
X
A B C D E
0.9
X
A B C D E
0.9
0.50.7
X
A B C D E
0.9
0.7
X
A B C D E
0.9
0.7
XX
0.8
Choi’s Algorithm
A B C D E
0.60.9
X
A B C D E
0.9
X
A B C D E
0.9
0.50.7
X
A B C D E
0.9
0.7
X
A B C D E
0.9
0.7
XX
0.8
A B C D E
0.9
0.7
0.8
0.5
0.8