18
ANNOTATING EVENT ANAPHORA: A CASE STUDY Tommaso Caselli and Irina Prodanof ILC-CNR, Pisa [email protected] [email protected] LREC-10 – May, 19th, La Valletta, Malta

ANNOTATING EVENT ANAPHORA: A CASE STUDY

Embed Size (px)

DESCRIPTION

ANNOTATING EVENT ANAPHORA: A CASE STUDY. Tommaso Caselli and Irina Prodanof ILC-CNR, Pisa [email protected] [email protected]. LREC-10 – May, 19th, La Valletta, Malta. Outline. Motivations Coreference annotation in TimeML Annotating event anaphora: a preliminary scheme - PowerPoint PPT Presentation

Citation preview

ANNOTATING EVENT ANAPHORA:

A CASE STUDY

Tommaso Caselli and Irina Prodanof

ILC-CNR, Pisa

[email protected] [email protected]

LREC-10 – May, 19th, La Valletta, Malta

Outline

Motivations Coreference annotation in TimeML Annotating event anaphora: a preliminary

scheme Annotation methodology and results Lesson learned and future works

Motivations

Eventualities represent the building blocks of the informative content of a document

Eventualities give rise to relations which create a rich informative network. temporal relations sharing of participants factivity coreferential relations

Coreferential relations among eventualities plays an important role for facilitating access to content and extract relevant information

Coref. in TimeML

TimeML & ISO-TimeML are standards for the annotation of events, temporal expressions and a set of relations between these entities (temporal, subordinating and aspectual relations)

Main contribution of TimeML: standard definition of event and methodology for its annotation

It-TimeML: Italian adaptation of TimeML (updated version on request) and part of ISO-TimeML

It-TimeML is currently used for the creation of the Italian TimeBank (172 news articles from ISST, PAROLE and Web, 67,140 tokens)

TimeML tags involved: EVENT and TLINK (temporal link) TimeML has not a specific link for coreference

annotation workaround: use of a special value of the TLINK tag: “identity”

“identity” is used to: connect two tokens which are part of a single

event instance (e.g. light verbs) connect coreferential relations between events,

namely set-subset

Coref. in TimeML (2)

fare la spesa [to do shopping].<EVENT id="e1">fare</EVENT> la<EVENT id="e2">spesa</EVENT><TLINK lid="l1" eventInstanceID="e1"relatedToEventInstance="e2“ relType="IDENTITY"/>

Coref. in TimeML (3) – Use of “identity”

Coref. in TimeML – Use of “identity” (3) La sessione privata servira’ a tre adempimentij . Innanzitutto, all’

approvazionej della proposta di Abete (ISST sole006).The private session will be used for three [fulfillments] j . First, the

[approval]j of the proposal of Abete.La <EVENT id="e1">sessione</EVENT> privata <EVENT id="e2">servira’</EVENT> a tre <EVENT id="e3">adempimenti</EVENT>. <SIGNAL id="s1">Innanzitutto</SIGNAL>, all’ <EVENT id="e4>approvazione</EVENT> della <EVENT id="e5">proposta</EVENT>di Abete.

<TLINK lid="l1" eventInstanceID="e4“ relatedToEventInstance="e3"relType="IDENTITY"/>

The use of the value “identity” is not satisfactory since it is NOT homogeneous

During the (current!) annotation effort for the creation of the Italian TimeBank we have observed that this value could be applied to other cases such as: synonyms hypernyms coreference (strict coreference – same referent in the

world)

Coref. in TimeML (4)

Event Anaphora Previous works: Hasler et al 2006; Bejan & Harabagiu

2008 Hasler et al. 2006: only NPs coreference (strict

definition), detailed guidelines – but NO specifications for the annotation; which events? ACE event frame (LIFE, CONFLICT,

MOVEMENT, JUSTICE….) TimeML compliant

Bejan & Harabagiu 2008: event coreference as a side effect of event structure. Event coreference is considered when two predicates express

same predicate, synonyms or hypernyms and share same arguments

TimeML compliant

Event Anaphora - Methodology (2)

Our approach: no event frames nor event templates; all instances of

event annotated in the Italian TimeBank (TimeML compliant);

open-domain text/discourse coarse grained bottom up approach in the definition

of the annotation scheme reduced and limited set of guidelines active

discovery of what is needed through annotation and observations from the data

event anaphora: strict coreference + indirect coreference

Event Anaphora - Annotation scheme (3)

TAGS ATTRIBUTES

MARKABLE ID, POS, DEFINITENESS, CLASS

EMPTY ID

TOPIC ID

LINK ID, ANAPHORTYPE, SRC

MAJJJJJJIII<MARKABLE> = <EVENT> BUT extended includes annotation of pronouns and adverbs.

Event Anaphora - Annotation scheme (4)

<EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian)

<TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic

“Stiamo ancora parlando, come certamente deve essere, e continueremo a consultarci”j . James Baker, segretario al Tesoro americano, ha commentato cosi’j i risultati dell’assemblea. (ISST els019)

“[We are still speaking, as it should be, and we will keep consulting]”j . James Baker, the American Treasure secretary, commented [so]j the results of the assembly.

Event Anaphora - Annotation scheme (4)

<EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian)

<TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic

<LINK> = it marks up an anaphoric relations. The attribute “anaphorType” explicits which type of anaporic relation “src” marks the anchor

Event Anaphora – Results (5) Annotation tool: PALinkA (Orasan, 2003) 3 annotators / 1,792 tokens no K scores

-Low agreement on the identification of anaphora but relative good on the anchors

- More specific guidelines and information

-Event anaphora is a widespread phenomenon

Lession Learned and Future Work Event anaphora is a widespread phenomenon which must be

addressed in separate tasks Relations between full event N, V, PP and Adj no pronominal anaphoras

New annotation scheme: 2 tags: <EVENT> and <AnafLink> different attributes for <EVENT>: FACTIVITY, GENERICITY,

POLARITY relations between particular events according to the attributes' values reduced type of anaphors (two values: direct vs. indirect)

Tracking of the participants: how to? Event anaphora annotation as a further link in TimeML or as

a separate task which can be built upon the TimeML annotation

New Tool: BAT (thanks to Marc Verhagen)

Lession Learned and Future Work - Example

Lession Learned and Future Work - Example

Thank you!