An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng

1

An Entity-Mention Model for Coreference Resolution

with Inductive Logic Programming

Xiaofeng Yang1 Jian Su1 Jun Lang2

Chew Lim Tan3 Ting Liu2 Sheng Li2

Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen

1 Institute for Infocomm Research2 Harbin Institute of Technology

3 National University of SingaporeACL 2008

22

Introduction

Coreference resolution : The process of linking multiple mentions that refer to the same entity

coreference; anaphor 同指涉 antecedent 先行詞

Inductive logic programming : Supervised learning Inductive rule from positive cases

22

33

Related Work1. Mention pair model

Aone and Bennett (1995); McCarthy and Lehnert (1995);Soon et al. (2001); Ng and Cardie (2002)) Individual mention usually lacks adequate descriptive

information of the referred entity (ex: Powell vs he)

2. Entity-mention model Luo et al., 2004; Yang et al., 2004 Can’t describing each individual mention in an entity

3. Inductive logic programming in NLP Parsing (Mooney, 1997) POS disambiguation (Cussens, 1996) Lexicon construction (Claveau et al., 2003) WSD (Specia et al., 2007) 3

4

Modeling Coreference Resolution

The probability that a mention belongs to an entity

Example

e1 : Microsoft Corp. - its - The company e2 : its new CEO - he e3 : yesterday

4

5

Mention-Pair Model

Soon et al. (2001) and Ng and Cardie (2002) Instance i{mk, mj}

mj is an active mention & mk is a preceding mention

Positive: mj and its closest antecedent (only one for mj )

Negative: every intervening mentions between mj and its closest antecedent

mj is linked with the mention that is classified as positive (if any) with the highest confidence value

6

Feature Set for Coreference Resolution

1

2

3

同位語

述詞

7

Entity-Mention Model

Mention-pair model error: Lack adequate descriptive information “ Mr. Powell”, “Powell”, and “she”

Instance i{mk, mj} Positive: mj and the entity to which mj belongs. Negative: every entity whose last mention occurs

between mj and its closest antecedent

If no positive entity exists, the mj forms a new entity

entity features: first-order features Any-X, Most-X, All-X Distance feature :the minimum distance between

the mentions in the entity and the active mention.

8

Entity-mention Model with ILP (1/3)

Tool: ALEPH by Srinivasan (2000) (Oxford) Input: positive example E+

negative example E-

background knowledge K Output: hypotheses h

e1_6 denotes the part of e1 before m6,

example representation: link(e1_6 , m6)

8

9


background knowledge K predicates1. Information related to ei_j and mj

2. Relations between ei_j and its mentions

has_mention(e1_6 , m6)

3. Information related to mj and each mention mk in ei_j

9

10


Hypothesis rule

link(A,B) :-has mention(A,C), numAgree(B,C,1),strMatch Head(B,C,1), bareNP(C,1).

11

Experiments and Result(1/4) Corpus: ACE-2 V1.0 corpus (NIST, 2003)

Modify ILP tool, ALEPH: Rule accuracy 100% to 50% 3 predicates to 10 predicates

11

12

Baseline model: C4.5 algorithm

Preprocessing Tokenizer Part-of-Speech tagger

accuracy of 97% on Penn WSJ TreeBank NP chunker (Zhou and Su, 2000)

F-measure above 94% on Penn WSJ TreeBank Named-Entity Recognizer (Zhou and Su, 2002)

F-measure of 96.6% (MUC-6) and 94.1%(MUC-7)

12

Experiments and Result(2/4)

13


F-measure is 2-4% lower than the state-of-the-art, which utilized sophisticated semantic or real world knowledge

Significant under 2-tailed t test (p < 0.05) 13

14


Multiple non-instantiated arguments (i.e. C and D) could possibly appear in the same rule

1515

Conclusion & Future Work The model can express the relations between an

entity and its mentions, and to automatically learn the first-order rules

ILP based entity-model performs better than the mention-pair model (with up to 2.3% increase in F-measure)

Future work: Investigate more sophisticated clustering methods

that would lead to global optimization keeping a large search space (Luo et al., 2004) using integer programming (Denis and Baldridge,

2007)

1515

1616

Thank you!

1616

Documents

An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng