Upload
yetta-terrell
View
30
Download
2
Embed Size (px)
DESCRIPTION
Forest-based Semantic Role Labeling. Hao Xiong, Haitao Mi , Yang Liu and Qun Liu Institute of Computing Technology Academy of Chinese Sciences. Semantic Role Labeling. Given a sentence and its verbs Identify the arguments of the verbs Assign semantic labels (the roles they play). - PowerPoint PPT Presentation
Citation preview
INS
TIT
UTE O
F C
OM
PU
TIN
G
TEC
HN
OLO
GY
Forest-based Semantic Role
Labeling
Hao Xiong, Haitao Mi, Yang Liu and Qun Liu
Institute of Computing TechnologyAcademy of Chinese SciencesAAAI 2010, Atlanta7/15/10 1
INSTITUTE OF COMPUTING
TECHNOLOGY
Semantic Role Labeling
Given a sentence and its verbs Identify the arguments of the verbs Assign semantic labels (the roles they play)
This company
last year
1000 cars
in the U.S.
sold
AgentAgent PatientPatientArgMod-
TeMPoral
ArgMod-
TeMPoral
ArgMod-LOCationArgMod
-LOCation
PropBank (Kingsbury and Palmer 2002)
7/15/10 2
INSTITUTE OF COMPUTING
TECHNOLOGY
One Conventional Approach
the role of Celimene is played by Kim Cattrall
Patient Agent
AAAI 2010, Atlanta7/15/10 3
INSTITUTE OF COMPUTING
TECHNOLOGY
One Conventional Approach
the role of Celimene is played by Kim Cattrall
Patient Agent
S
NP PP
VP
AUX VBN PP
VPNP
AAAI 2010, Atlanta7/15/10 4
INSTITUTE OF COMPUTING
TECHNOLOGY
One Conventional Approach
the role of Celimene is played by Kim Cattrall
Patient Agent
S
NP PP
VP
AUX VBN PP
VP?
more than 15%
AAAI 2010, Atlanta7/15/10 5
INSTITUTE OF COMPUTING
TECHNOLOGY
…
1 2 3 k…
Solution
k-best parses: limited scope: k too much redundancy
25<50<26
S
NP PP
VP
AUXVBN PP
VPNP
S
NP PP
VP
AUXVBN PP
VP …AAAI 2010, Atlanta7/15/10 6
INSTITUTE OF COMPUTING
TECHNOLOGY
Our Solution Forest
A compact representation of many parses By sharing common sub-derivations Polynomial-space encoding of exponentially large
set S
NP PP
VP
AUX VBN PP
NPVP
S
NP PP
VP
AUXVBN PP
VPNP
S
NP PP
VP
AUXVBN PP
VP …Unpack
AAAI 2010, Atlanta7/15/10 7
INSTITUTE OF COMPUTING
TECHNOLOGY
Our Solution Forest
A compact representation of many parses By sharing common sub-derivations Polynomial-space encoding of exponentially large
set S
NP PP
VP
AUX VBN PP
NPVP
AAAI 2010, Atlanta7/15/10 8
INSTITUTE OF COMPUTING
TECHNOLOGY
Outline
Tree-based Semantic Role Labeling Parsing Selecting candidates Extracting features Classifying
Forest-based Semantic Role Labeling Experiments Conclusion
AAAI 2010, Atlanta7/15/10 9
INSTITUTE OF COMPUTING
TECHNOLOGY
Parsing
S
NP NP VP
DT NN JJ NN VBD NP PP
CD NNS IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.AAAI 2010, Atlanta7/15/10 10
INSTITUTE OF COMPUTING
TECHNOLOGY
Selecting Candidates
S
NP NP VP
DT NN JJ NN VBD NP PP
CD NNS IN NP
NNPDT
soldThis company last year
1000 cars in
the U.S.AAAI 2010, Atlanta7/15/10 11
INSTITUTE OF COMPUTING
TECHNOLOGY
Extracting Features
S
NP NP VP
DT NN JJ NN VBD NP PP
CD NNS IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.
Path to the predicate
This company last year
1000 cars in
the U.S.
NNS
NPSVPVBN
AAAI 2010, Atlanta7/15/10 12
INSTITUTE OF COMPUTING
TECHNOLOGY
Extracting Features
S
NP NP VP
DT NN JJ NN VBD NP PP
CD IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.
Position: left
This company last year
1000 cars in
the U.S.
NNS
NPSVPVBN
left
AAAI 2010, Atlanta7/15/10 13
INSTITUTE OF COMPUTING
TECHNOLOGY
Extracting Features
S
NP NP VP
DT NN JJ NN VBD NP PP
CD IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.
Head word: company
This company last year
1000 cars in
the U.S.
NNS
NPSVPVBN
leftcompany
AAAI 2010, Atlanta7/15/10 14
INSTITUTE OF COMPUTING
TECHNOLOGY
Extracting Features
S
NP NP VP
DT NN JJ NN VBD NP PP
CD IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.
Head POS tag: NN
This company last year
1000 cars in
the U.S.
NNS
NPSVPVBN
leftcompany
NN…
AAAI 2010, Atlanta7/15/10 15
INSTITUTE OF COMPUTING
TECHNOLOGY
Classifying
S
NP NP VP
DT NN JJ NN VBD NP PP
CD IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.S(Agent)=0.1S(Patient)=0.1S(None)=0.5
…
S(AM-TMP)=0.9S(Patient)=0.1S(None)=0.1
…
S(Agent)=0.2S(Patient)=0.8S(None)=0.1
…
S(Agent)=0.8S(Patient)=0.1S(None)=0.1
…
S(AM-LOC)=0.9S(Agent)=0.1S(None)=0.1
…
Computing Score using a trained classifier
This company last year
1000 cars in
the U.S.
NNS
16
INSTITUTE OF COMPUTING
TECHNOLOGY
Classifying
S
NP NP VP
DT NN JJ NN VBD NP PP
CD IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.
S(Agent)=0.8
…
S(AM-LOC)=0.9
…
This company last year
1000 cars in
the U.S.
NNS
S(None)=0.5
…
S(AM-TMP)=0.9
…
S(Patient)=0.8
…
Best score for each constituentSimply sort themChoose the best label sequence
NP
17
INSTITUTE OF COMPUTING
TECHNOLOGY
Classifying
S
NP NP VP
DT NN JJ NN VBD NP PP
CD IN NP
NNPDT
This company last year sold
1000 cars in
the U.S.
Agent AM-TMP V
Patient
AM-LOC
This company last year
1000 cars in
the U.S.
NNS
18
INSTITUTE OF COMPUTING
TECHNOLOGY
Outline
Tree-based Semantic Role Labeling Forest-based Semantic Role Labeling
Parsing into a forest Selecting candidates Extracting features on forest Classifying
Experiments Conclusion
AAAI 2010, Atlanta7/15/10 19
INSTITUTE OF COMPUTING
TECHNOLOGY
Forest
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
Hyper-graph Hyper-edge
Node
AAAI 2010, Atlanta7/15/10 20
INSTITUTE OF COMPUTING
TECHNOLOGY
Selecting Candidates
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
AAAI 2010, Atlanta7/15/10 21
INSTITUTE OF COMPUTING
TECHNOLOGY
Exacting features
Path to the predicate
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
NPNPSVPVPVBN
AAAI 2010, Atlanta7/15/10 22
INSTITUTE OF COMPUTING
TECHNOLOGY
Exacting features
Path to the predicate
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
NPSVPVPVBN
NPNPSVPVPVBNshortes
t
AAAI 2010, Atlanta7/15/10 23
INSTITUTE OF COMPUTING
TECHNOLOGY
Exacting features
Parent LabelNPSVPVPVBN
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
AAAI 2010, Atlanta7/15/10 24
INSTITUTE OF COMPUTING
TECHNOLOGY
Exacting features
Parent Label
the role of Celimene is played by Kim Cattrall
NP PP
VP
AUX VBN PP
VP
NPSVPVPVBN
in the shortest path
AAAI 2010, Atlanta7/15/10 25
INSTITUTE OF COMPUTING
TECHNOLOGY
New Features Parsing score (Fractional value (Mi et al., 2008))
Inside-outside Marginal prob.
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
NPSVPVPVBN
f(NP3)
AAAI 2010, Atlanta7/15/10 26
INSTITUTE OF COMPUTING
TECHNOLOGY
Classifying
S(Patient)=0.8S(Agent)=0.1S(None)=0.2
…
S(Patient)=0.5S(Agent)=0.1S(None)=0.3
…
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
S(Agent)=0.8S(Patient)=0.1S(None)=0.2
…
AAAI 2010, Atlanta7/15/10 27
INSTITUTE OF COMPUTING
TECHNOLOGY
Classifying
S(Patient)=0.8
…
the role of Celimene is played by Kim Cattrall
S
NP PP
VP
AUX VBN PP
NP
VP
S(Agent)=0.8
…
Patient AgentAAAI 2010, Atlanta7/15/10 28
INSTITUTE OF COMPUTING
TECHNOLOGY
Outline
Tree-based Semantic Role Labeling Forest-based Semantic Role Labeling Experiments Conclusion
AAAI 2010, Atlanta7/15/10 29
INSTITUTE OF COMPUTING
TECHNOLOGY
Experiments
Corpus: CoNLL-2005 shared task Sections 02-21 of PropBank for
training Section 24 for development set Section 23 for test set Total
43,594 sentences 262,281 arguments
AAAI 2010, Atlanta7/15/10 30
INSTITUTE OF COMPUTING
TECHNOLOGY
Experiments
Training sentences Parse into 1-best and forest Prune forest using inside-outside algorithm Train classifiers
Decoding sentences Parse into 1-best and forest Prune forest using inside-outside algorithm Use classifiers
AAAI 2010, Atlanta7/15/10 31
INSTITUTE OF COMPUTING
TECHNOLOGY
Features
Predicate lemma Path to predicate Path length Partial path Position Voice Head word/POS tag …
AAAI 2010, Atlanta7/15/10 32
INSTITUTE OF COMPUTING
TECHNOLOGY
Results on Dev Set
precision
recall
F
1-best 50-best forest(p3)9.63×105
forest(p5)5.78×106
33
INSTITUTE OF COMPUTING
TECHNOLOGY
Results on Tst Set
AAAI 2010, Atlanta7/15/10 34
INSTITUTE OF COMPUTING
TECHNOLOGY
Outline
Tree-based Semantic Role Labeling Forest-based Semantic Role Labeling Experiments Conclusion
AAAI 2010, Atlanta7/15/10 35
INSTITUTE OF COMPUTING
TECHNOLOGY
Conclusion
Forest Exponentially encode many parses Enlarge the candidate space Explore more rich features
Improve the quality significantly Not necessary using very large forest Can NOT use k-best to simulate
Future works Features on forest
AAAI 2010, Atlanta7/15/10 36
INSTITUTE OF COMPUTING
TECHNOLOGY
Thank you!Patient
AAAI 2010, Atlanta7/15/10 37