Upload
brittany-gilmore
View
213
Download
0
Embed Size (px)
Citation preview
1
An Entity-Mention Model for Coreference Resolution
with Inductive Logic Programming
Xiaofeng Yang1 Jian Su1 Jun Lang2
Chew Lim Tan3 Ting Liu2 Sheng Li2
Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen
1 Institute for Infocomm Research2 Harbin Institute of Technology
3 National University of SingaporeACL 2008
22
Introduction
Coreference resolution : The process of linking multiple mentions that refer to the same entity
coreference; anaphor 同指涉 antecedent 先行詞
Inductive logic programming : Supervised learning Inductive rule from positive cases
22
33
Related Work1. Mention pair model
Aone and Bennett (1995); McCarthy and Lehnert (1995);Soon et al. (2001); Ng and Cardie (2002)) Individual mention usually lacks adequate descriptive
information of the referred entity (ex: Powell vs he)
2. Entity-mention model Luo et al., 2004; Yang et al., 2004 Can’t describing each individual mention in an entity
3. Inductive logic programming in NLP Parsing (Mooney, 1997) POS disambiguation (Cussens, 1996) Lexicon construction (Claveau et al., 2003) WSD (Specia et al., 2007) 3
4
Modeling Coreference Resolution
The probability that a mention belongs to an entity
Example
e1 : Microsoft Corp. - its - The company e2 : its new CEO - he e3 : yesterday
4
5
Mention-Pair Model
Soon et al. (2001) and Ng and Cardie (2002) Instance i{mk, mj}
mj is an active mention & mk is a preceding mention
Positive: mj and its closest antecedent (only one for mj )
Negative: every intervening mentions between mj and its closest antecedent
mj is linked with the mention that is classified as positive (if any) with the highest confidence value
6
Feature Set for Coreference Resolution
1
2
3
同位語
述詞
7
Entity-Mention Model
Mention-pair model error: Lack adequate descriptive information “ Mr. Powell”, “Powell”, and “she”
Instance i{mk, mj} Positive: mj and the entity to which mj belongs. Negative: every entity whose last mention occurs
between mj and its closest antecedent
If no positive entity exists, the mj forms a new entity
entity features: first-order features Any-X, Most-X, All-X Distance feature :the minimum distance between
the mentions in the entity and the active mention.
8
Entity-mention Model with ILP (1/3)
Tool: ALEPH by Srinivasan (2000) (Oxford) Input: positive example E+
negative example E-
background knowledge K Output: hypotheses h
e1_6 denotes the part of e1 before m6,
example representation: link(e1_6 , m6)
8
9
Entity-mention Model with ILP (2/3)
background knowledge K predicates1. Information related to ei_j and mj
2. Relations between ei_j and its mentions
has_mention(e1_6 , m6)
3. Information related to mj and each mention mk in ei_j
9
10
Entity-mention Model with ILP (3/3)
Hypothesis rule
link(A,B) :-has mention(A,C), numAgree(B,C,1),strMatch Head(B,C,1), bareNP(C,1).
11
Experiments and Result(1/4) Corpus: ACE-2 V1.0 corpus (NIST, 2003)
Modify ILP tool, ALEPH: Rule accuracy 100% to 50% 3 predicates to 10 predicates
11
12
Baseline model: C4.5 algorithm
Preprocessing Tokenizer Part-of-Speech tagger
accuracy of 97% on Penn WSJ TreeBank NP chunker (Zhou and Su, 2000)
F-measure above 94% on Penn WSJ TreeBank Named-Entity Recognizer (Zhou and Su, 2002)
F-measure of 96.6% (MUC-6) and 94.1%(MUC-7)
12
Experiments and Result(2/4)
13
Experiments and Result(3/4)
F-measure is 2-4% lower than the state-of-the-art, which utilized sophisticated semantic or real world knowledge
Significant under 2-tailed t test (p < 0.05) 13
14
Experiments and Result(4/4)
Multiple non-instantiated arguments (i.e. C and D) could possibly appear in the same rule
1515
Conclusion & Future Work The model can express the relations between an
entity and its mentions, and to automatically learn the first-order rules
ILP based entity-model performs better than the mention-pair model (with up to 2.3% increase in F-measure)
Future work: Investigate more sophisticated clustering methods
that would lead to global optimization keeping a large search space (Luo et al., 2004) using integer programming (Denis and Baldridge,
2007)
1515
1616
Thank you!
1616