72
“Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

“Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

Embed Size (px)

Citation preview

Page 1: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

“Mini” Tutorial:Recognition and Normalization

of Time Expressions

Matteo Negri

1st ONTOTEXT Project WorkshopTrento, 25/11/2004

Page 2: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2The TERN Experience

CHRONOS and

ONTOTEXT

Page 3: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2The TERN Experience

CHRONOS and

ONTOTEXT

Page 4: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Reasoning About Time

• Crucial step towards automatic language comprehension– Broad variety of related issues (e.g. What is the basic

temporal unit? How to represent the temporal meaning of an expression? Which parts of a text convey a temporal meaning? How to capture the relation between time and events? …)

• What this talk IS about: the interpretation of the meaning of expressions that refer to time– i.e. expressions telling us when something happened, how

long something lasted, or how often something occurs

• What this talk IS NOT about: event expressions, temporal anchoring of events, dependencies between events and times, etc.

>

Page 5: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Interpreting the Meaning of Temporal Expressions• What we want to do:

Place any temporal expression present in a given input text over a timeline (a discrete, unlimited, and totally ordered sequence of points)

intervals (“four years”)

points (“2004”)

sets of times (“every 3 years”)

Ancora quattro anni di scavo e poi altri due dedicati solo al ripristino. La cava di Ronchi di Mattarello, che da mezzo secolo viene usata per l´estrazione di materiale calcareo per l´edilizia, sta per chiudere il suo ciclo vitale. Il comitato provinciale per l´ambiente nella seduta di ieri ha deciso di concedere la proroga di sei anni richiesta dalla Cava di Ronchi srl, che da pochi mesi ha rilevato da una società immobiliare l´attività che fu della Pedrotti asfalti. La coltivazione vera e propria potrà però andare avanti solo fino al 2008. Poi l´azienda dovrà concentrarsi solo sulla cura e il ripristino dei luoghi.Quella concessa ieri, che dovrà essere ora confermata dalla giunta provinciale, è l´ultima proroga. La delibera lo dirà espressamente. Quel tipo di attività qualche decina di anni fa non aveva grosse controindicazioni in quella zona, ma ora la città e il sobborgo sono cresciuti e le polveri e quel viavai di camion non sono più considerati compatibili. Anche perché l´avallamento che ospita la cava è posto proprio a fianco dell´area naturalistica del Casteller, gestita dalla Federazione cacciatori ma che la Provincia vorrebbe valorizzare con sostanziosi contributi pubblici.Anche dopo la chiusura della cava la società potrà comunque proseguire con l´altra attività, quella di produzione di asfalto. Ogni tre anni, il ciclo operativo prevede lo scavo del materiale e la selezione per la vendita. Il buco viene poi progressivamente riempito con il materiale proveniente dalle demolizioni, che passa prima attraverso la sezione di separazione per il recupero e il riciclo e poi dal vaglio e dal frantoio. Anche in questo caso una parte viene indirizzata alla produzione di asfalti.

texttimeline

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

2004

Page 6: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Interpreting the Meaning of Temporal Expressions cont.• What we need to do:

1. Detect the temporal expressions present in a text and determine their extension

2. Model the temporal context in which they occur

• Human language is full of context-dependent time expressions (e.g. “today”, “next week”, “November 25”) which refer to a particular temporal location (day, month, year)

3. Provide a normalization framework in order to encode the same meaning (e.g. “November 25, 2004”, “25/11/2004”, “2004/11/25”) in the same way (e.g. “2004-11-25” in ISO 8601 format)

Page 7: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Question reformulation

Q: “Is Bill Clinton currently the President of the United States?”

Page 8: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Question reformulation

Q: “Is Bill Clinton currently the President of the United States?”

November 2004

Page 9: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Question reformulation

Q: “Is Bill Clinton currently the President of the United States?”

November 2004

Q’: “Is Bill Clinton the President of the United States in November 2004?”

Q’’: “Who is the President of the United States in November 2004?”

Page 10: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship

at Oxford?”

In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later from his professorship at Oxford

“The Adventures of Tom Bombadil” was published in 1962, three years after Tolkien retired his professorship at Oxford.

…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year, after his retirement from teaching at Oxford, he …

Page 11: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship

at Oxford?”

In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later from his professorship at Oxford

“The Adventures of Tom Bombadil” was published in 1962, three years after Tolkien retired his professorship at Oxford.

…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year, after his retirement from teaching at Oxford, he …

Page 12: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship

at Oxford?”

In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later from his professorship at Oxford

“The Adventures of Tom Bombadil” was published in 1962, three years after Tolkien retired his professorship at Oxford.

…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year, after his retirement from teaching at Oxford, he …

Page 13: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship

at Oxford?”

In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later (=1959) from his professorship at Oxford

“The Adventures of Tom Bombadil” was published in 1962, three years after (=1959) Tolkien retired his professorship at Oxford.

…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year (=1959), after his retirement from teaching at Oxford, he …

1957: 11958: 11959: 31962: 1

A: 1959

Page 14: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Advanced reasoningQ: “Could Mozart and Beethoven meet in Vienna?”

“In 1784 Beethoven was able to deputize for his teacher. Three years later, recognizing his talent, Prince Maximilian Franz sent him to Vienna to further his education. He would soon return less than four months later on the news that his mother was dying. She passed away on July 17th 1787.”

“Mozart went to Munich to compose the opera late in 1780. Soon after, he was summoned from Munich to Vienna, where the Salzburg court was in residence on the accession of a new emperor. Mozart lived in Vienna for the rest of his life, until he died in 1791.”

Page 15: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: Question Answering

• Advanced reasoningQ: “Could Mozart and Beethoven meet in Vienna?”

“In 1784 Beethoven was able to deputize for his teacher. Three years later, recognizing his talent, Prince Maximilian Franz sent him to Vienna to further his education. He would soon return less than four months later on the news that his mother was dying. She passed away on July 17th 1787.”

“Mozart went to Munich to compose the opera late in 1780. Soon after, he was summoned from Munich to Vienna, where the Salzburg court was in residence on the accession of a new emperor. Mozart lived in Vienna for the rest of his life, until he died in 1791.”

1787

Beethoven in Vienna

1780 1791

Mozart in Vienna

A: YES, in 1787

Page 16: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Motivation: other NLP areas

• Information Retrieval“Give me the articles from the press one week after the election day”

• Summarization

“Give me a short biography on Mozart, in chronological order”

Need to know when events occur, to avoid inappropriate merging of distinct events

• …

Page 17: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2

The TERN Experience

CHRONOS and

ONTOTEXT

Page 18: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2

• Annotation standard for temporal expressions

• Extends the MUC’s definition of the TIMEX named entities category by:– Including a broader variety of expressions (e.g. “daily”,

“three years later”, “now”, “18-year-old”)– Replacing the TYPE (DATE vs TIME) categorization

attribute with a set of attributes expressing the normalized, intended meaning of a temporal expression

About TIMEX and the MUC Named Entity task:

http://www.itl.nist.gov/iaui/894.02/related_projects/muc/

Page 19: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Annotation Format

• Temporal expressions are annotated by inserting a special SGML tag around the text string, as in:

<TIMEX2>Christmas</TIMEX2>

• In addition, the TIMEX2 tag may contain one or more attributes, as in:

<TIMEX2 val=“2005-11-25TAF” mod=“START”> early this afternoon</TIMEX2>

Page 20: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Markable Expressions

• Features of a markable expression:

– The syntactic head of the expression must be an appropriate lexical trigger, or a pronoun that co-refers with a markable time expression

– Each lexical trigger is a word or numeric expression whose meaning conveys a temporal unit or concept

– To be a trigger, the referent must be able to be oriented on a timeline with a relation to a time (past, present, future)

For details, seeFerro et al.: TIDES 2003 Standard for

the Annotation of Temporal Expressions, September 2003

>

Page 21: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Normalization Attributes• Designed to consistently capture the semantics of

markable expressions in the annotations

Attribute Function Example

VALContains a normalized form of the date/time (ISO 8601 format)

VAL=“2004-11-25”

MOD Captures temporal modifiers MOD=“APPROX”

ANCHOR_VALContains a normalized form of an anchoring date/time

ANCHOR_VAL=“2004-11-24”

ANCHOR_DIRCaptures the relative direction/ orientation between VAL and ANCHOR_VAL

ANCHOR_DIR=“BEFORE”

SETIdentifies expressions denoting sets of times

SET=“YES”

NON_SPECIFIC Identifies non-specific expressions NON_SPECIFIC=“YES”

COMMENTContains any comment the annotator wants to add

COMMENT=“any string”

Page 22: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""

comment="">today</TIMEX2>

2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>

3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>

4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>

5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>

6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>

7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>

8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>

9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>

Page 23: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""

comment="">today</TIMEX2>

2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>

3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>

4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>

5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>

6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>

7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>

8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>

9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>

Page 24: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""

comment="">today</TIMEX2>

2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>

3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>

4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>

5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>

6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>

7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>

8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>

9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>

Page 25: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""

comment="">today</TIMEX2>

2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>

3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>

4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>

5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>

6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>

7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>

8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>

9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>

Page 26: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""

comment="">today</TIMEX2>

2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>

3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>

4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>

5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>

6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>

7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>

8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>

9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>

Page 27: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2The TERN Experience

Normalization

CHRONOS ArchitectureDetection and

Bracketing

Results

OverviewCHRONOS

and ONTOTEXT

Page 28: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2

Normalization

CHRONOS ArchitectureDetection and

Bracketing

Results

OverviewCHRONOS

and ONTOTEXT

The TERN Experience

Page 29: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Time Expression Recognition and Normalization (TERN)• Task: detect and normalize (with TIMEX2 tags) all the

temporal expressions occurring in the source data

• Time span: April-September 2004

• Organizers: NIST, MITRE Corp.

• Sponsor: Automatic Content Extraction (ACE) program – Started in September 1999

– Administered by NSA, NIST, and CIA

– ACE’s objective: develop NLP technology to support automatic understanding of textual data

For further information:

TERN: http://timex2.mitre.org/tern.html

ACE: http://itl.nist.gov/iaui/894.01/tests/ace/

Page 30: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004

• Two source languages: English and Chinese• Two separate tasks:

– Detection– Detection + Normalization

• Input:– Broadcast news and newswire texts

• Output:– In-line annotation with TIMEX2 tags

• Evaluation figures:– Correct, incorrect, misses, spurious, undergeneration,

overgeneration, substitution, error, precision, recall, F-measure

>

Page 31: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: Detection

• Participants:– English: 6; Chinese: 4

• All the participants viewed the task as a supervised learning problem, and used the available annotated data to train their system (SVM, Maximum Entropy, HMM)

• Features considered :– LEXICAL: tokens, n-grams, prefixes and suffixes,

capitalization, digits, punctuation

– SYNTACTIC: Parts of Speech, chunks, syntactic patterns, patterns of numerical date expressions

– TASK SPECIFIC: timex dictionary, other taggers

Page 32: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: EnglishDetection + Normalization• Participants:

– English: 6; Chinese: 0

• All the participants addressed the task with rather similar rule-based approaches– 2-step strategy:

• detection and bracketing

• normalization

– Similar linguistic preprocessing: POS-tagging, chunking

– Normalization is still considered out of the reach of ML

Page 33: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2

Normalization

CHRONOS ArchitectureDetection and

Bracketing

Results

OverviewCHRONOS

and ONTOTEXT

The TERN Experience

Page 34: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

The CHRONOS system

• Extension of our multilingual (Eng/Ita) NER system– Rule-based approach to recognize: <PERSON>, <LOCATION>, <ORGANIZATION>, <MEASURE>,

<MONEY>, <CARDINAL>, <PERCENT>;

<DATE>, <DURATION>, <TIME> mapped to <TIMEX2>

– Proper names (“Galileo Galilei”) and trigger words (“astronomer”) are mined from the WordNet hierarchy Advantages: 1) reduced effort to create/maintain

reliable gazetteers (261 proper name hyponyms of calendar_day#1)

2) useful basis for multilinguality

Page 35: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

CHRONOS: Architecture

Tokenization, POS Tagging & Multiwords

Recognition

Basic Rules Application

Composition Rules Application

Plain English Text

Detection and Bracketing

Tagged Text

OK for (non normalized)

<PERSON>, <LOCATION>, <ORGANIZATION>, <MEASURE>, <MONEY>, <CARDINAL>, <PERCENT>

TIMEX2?

Page 36: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

CHRONOS: Architecture

Tokenization, POS Tagging & Multiwords

Recognition

Basic Rules Application

Composition Rules Application

Plain English Text

Attributes Normalization

Dates Normalization

Anchors Selection

Tagged TextIntermediate Annotation

Detection and Bracketing Normalization

Page 37: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2

Normalization

CHRONOS ArchitectureDetection and

Bracketing

Results

OverviewCHRONOS

and ONTOTEXT

The TERN Experience

Page 38: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Detection & Bracketing:Basic Rules• ~1500 hand-crafted rules (~1.5PM)

– Regular expressions checking for word senses, parts of speech, symbols, words satisfying specific predicates

• Detection– Markable expressions are detected considering the presence

in the input text of lexical triggers• “year”, “Seventies”, “Friday”, “Christmas”, “today”,

“daily”, “09/23/2004”, “1970s”, etc.• Bracketing

– Considers the context surrounding the detected triggers• “beginning”, “end”, “previous”, “next”, “ago”, “later”,

“before”, “during”, “nearly”, “almost”, “3”, “sixth”, etc.

Page 39: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• Information gathering– Goal: mine relevant information for normalization

– Considers triggers+context to fill:

Detection & Bracketing: Basic Rules cont.

Page 40: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• Information gathering– Goal: mine relevant information for normalization

– Considers triggers+context to fill:

TIMEX2 attributes

MOD: “more than”, “approximately” …

SET: “every”, “twice a” …

ANCHOR_DIR: “before”, “ago”, “during”...

Detection & Bracketing:Basic Rules cont.

Page 41: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• Information gathering– Goal: mine relevant information for normalization

– Considers triggers+context to fill:

TIMEX2 attributes

MOD: “more than”, “approximately” …

SET: “every”, “twice a” …

ANCHOR_DIR: “before”, “ago”, “during”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day, …, millennium]

op: [=, +, -]

quant: [n≥0]

Detection & Bracketing:Basic Rules cont.

Nearly three years later

LESS_THAN

Nearly three years later

ENDING

Nearly three years later

T-REL

Nearly three years later

YEAR

Nearly three years later

+

Nearly three years later

3

Nearly three years later

Page 42: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Basic Rules: an Example

PATTERN t1 t2 t3 t4

t1t2t3t4

[pred = approx-p] [pred = number-p][lemma = “year”][lemma = “later”]

OUTPUT (intermediate annotation)

<TIMEX2 val=“?” anchor_val=“?” mod=“LESS_THAN” anchor_dir=“ENDING” type=“T-REL” t-cat=“year” quant=“t2” op=“+”>t1 t2 t3 t4<\TIMEX2>

TIMEX2 attributes

Temporary attributes

Values to be determined

A basic rule matching with “Nearly three years later”

Page 43: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Detection & Bracketing:Composition Rules• Handle conflicts between possible multiple taggings

“I traveled for the whole Monday night”

Monday

the whole Monday

Monday night

the whole Monday night

Page 44: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Detection & Bracketing:Composition Rules• Handle conflicts between possible multiple taggings

“I traveled for the whole Monday night”

Monday

the whole Monday

Monday night

the whole Monday night

Page 45: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2

Normalization

CHRONOS ArchitectureDetection and

Bracketing

Results

OverviewCHRONOS

and ONTOTEXT

The TERN Experience

Page 46: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Normalization

• Anchors Selection (only for T-RELs)– Goal: connect each T-REL to an anchor

– 2 heuristics

CR-DATE: connects a T-REL to the document’s creation date (found at the beginning of the doc, or induced from doc’s name. e.g. “NYT20001025.1839.0279.sgm”)

PR-DATE: connects a T-REL to the nearest time expression with a compatible granularity (a t-cat with at least the same degree of specificity).

t-cat= “month” “month”, “week”, “day”, “decade”

Page 47: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Normalization cont.

HEURISTIC trigger trigger+context

PR-DATE

former, then, that, it following+trigger, previous+trigger, same+trigger, that+trigger, trigger+before, trigger+later

CR-DATE

yesterday, today, tonight, now, Monday, …, Sunday, January, …, December

this+trigger, last+trigger, next+trigger, past+trigger,

the+trigger, trigger+ago

Page 48: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Normalization cont.

• Dates Normalization– Goal: fill the VAL attribute of each detected time

expression

T-ABS: regular expressions considering their superficial form (“1990s” “199”)

T-REL: rewriting rules considering

the anchor (e.g. “2001”)

the operator (“OP”) to be applied (e.g. “+”)

the quantity (“QUANT”) to be added/subtracted (e.g. “3”)

three years later 2004“2001” “+” “3”

Page 49: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Normalization cont.

• Attributes Normalization– Goal: produce the final tagged text

• Removes temporary attributes• Introduces the normalized attributes “ANCHOR_VAL”

and “ANCHOR_DIR”

Page 50: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2

Normalization

CHRONOS ArchitectureDetection and

Bracketing

TERN-2004 Results

OverviewCHRONOS

and ONTOTEXT

The TERN Experience

Page 51: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

0,8410,716

0,944 0,872

TERN-2004: English DetectionF

- Me a

sur e

6 Participating Sites

00,10,20,30,40,50,60,70,80,9

1

CU IBMLingPipe

MetaCartaSheffield

Amsterdam

TIMEX2TEXT

Page 52: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: English Detection

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Broadcast News Newswire

TIMEX2TEXT

F- M

e asu

r eIBM

• Similar results between the two sources

• The same trend holds among most systems

Page 53: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

0,724

0,567

0,927 0,825

TERN-2004: Chinese DetectionF

- Me a

sur e

4 Participating Sites

•Minimal drop-off from English (94%) to Chinese (93%) for TIMEX2

00,1

0,20,30,40,5

0,60,70,8

0,91

CU LingPipe PolyU Sheffield

TIMEX2TEXT

Page 54: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

0,95 (0,944) 0,849 (0,872)

TERN-2004: English Detection + Normalization

00,10,20,30,40,50,60,70,80,9

1

CLACCymfonyITC-irstLockheedAlicante

Amsterdam

TIMEX2

TEXT

F- M

e asu

r e

6 Participating Sites

•Comparable performance of the two top system wrt English detection

Page 55: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: English Detection + Normalization

F- M

e asu

r e

6 Participating Sites

0,872 0,8640,774

•Three systems score above 85% for the VAL attribute

00,10,20,30,40,50,60,70,80,9

1

CLACCymfonyITC-irstLockheedAlicante

Amsterdam

VAL

MOD

SET

Page 56: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: English Detection + Normalization

F- M

e asu

r e

6 Participating Sites

0,760 0,726>

00,10,20,30,40,50,60,70,80,9

1

CLACCymfonyITC-irstLockheedAlicante

Amsterdam

A-DIR

A-VAL

Page 57: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004 Results: CHRONOS

TAG POSS ACT CORR INCO MISS SPUR PREC REC F

TIMEX2 1828 1648 1609 0 219 39 0.976 0.880 0.926

TIMEX2:ANCHOR_DIR 351 294 245 26 80 23 0.833 0.698 0.760

TIMEX2:ANCHOR_VAL 351 398 272 56 23 70 0.683 0.775 0.726

TIMEX2:MOD 50 43 36 1 13 6 0.837 0.720 0.774

TIMEX2:SET 39 25 22 0 17 3 0.880 0.564 0.688

TIMEX2:TEXT 1828 1648 1458 151 219 39 0.885 0.798 0.839

TIMEX2:VAL 1569 1560 1365 190 14 5 0.785 0.870 0.872

Detection Bracketing Normalization

Page 58: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: Wrap Up

• What worked– The rule-based approach (competitive + easy to develop,

maintain, and extend)

• What did not work– Relatively high number of missing tags (219: 11% of the

total detectable time expressions in the reference)

– Poor recall performance on specific attributes: SET: 0,56% ANCHOR_DIR: 0,69

Page 59: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004 Wrap Up: Future Directions• Conflict resolution (impact on detection)

– Our implementation of composition rules ignores embedded time expressions (e.g. “The eve of the new year”, “Sixty years ago today”)

• Anaphoric expressions (impact on detection)– Pronouns are not recognized by the system as possible

triggers (e.g. “Evelyn has seen 80 winters. This, she says, was the coldest”)

• Apparent dates (spurious taggings)– A stopword list of proper names (e.g. “USA Today”,

“Daily Telegraph”, “20th Century Fox”) is just a partial solution (intrinsically incomplete)

Page 60: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• Reported speech (impact on normalization)– The system’s ANCHOR_VAL selection heuristic fails

with reported speech fragments:

“He concluded the 1998 annual meeting saying: ‘The next year will be the eve of a new era for our company’.”

TERN-2004 Wrap Up: Future Directions

Page 61: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Outline

Reasoning about Time

TIMEX2The TERN Experience

Normalization

CHRONOS ArchitectureDetection and

Bracketing

Results

OverviewCHRONOS

and ONTOTEXT

Page 62: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• Given a topic (e.g. “Lorenzo Dellai”), present the user with related information in chronological order

…1990-XX-XX: a soli 31 anni Lorenzo Dellai diviene il più giovane sindaco di capoluogo regionale1995-XX-XX: Dellai rieletto con la maggioranza assoluta dei voti direttamente dai cittadini2003-09-27: Dellai entra di prepotenza nelle questioni interne alla lista dei DS: che dopo un primo scatto d’orgoglio si fanno umiliare e accettano il ruolo di satelliti. 2004-10-26: Il presidente uscente della Giunta provinciale, Lorenzo Dellai (centrosinistra), è il primo 'governatore' della storia della Provincia autonoma di Trento, eletto con 169.913 voti, pari al 60,82%.…2004-11-25: CONFERENZA DI INFORMAZIONE sullo stato del comparto industriale in Trentino […] conclude il dibattito Lorenzo Dellai

CHRONOS and ONTOTEXT: Application Scenarios

Page 63: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

CHRONOS and ONTOTEXT: Application Scenarios cont.

• Given a topic and a year (e.g. “Cantina La-vis”, “2003”), present related information in chronological order

2003-06-14: Secondo Fausto Peratoner, direttore della Cantina La-vis, “Lo Chardonnay non deve essere visto solo come un vitigno della viticoltura globalizzata. In Trentino, è un “vitigno naturalizzato”.

2003-08-25: Con il matrimonio tra la Cantina La-Vis e la Cantina Val di Cembra nasce oggi il terzo polo della viticoltura trentina.

2003-12-03: il direttore della Cantina La-Vis ha rammentato i 5 milioni di bottiglie, cui s'aggiunge un altro milione dello spumante Cesarini Sforza, gli oltre 40 milioni di fatturato, i 1.300 ettari di vigneti.

• Given a topic and a date (e.g. “Gianni Marangoni”, “2004-11-20”), retrieve related news articles from the DB

Cinque uomini armati di pistole e coltelli hanno assaltato ieri a Rovereto poco dopo le 19 di ieri la villa di Gianni Marangoni

Page 64: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• A new detection&bracketing component, based on ML• SVMlight was used

• Features considered: PoS, token, lemma, punctuation, capitalization, hyphenation, collocations of words and tokens

– A specific gazetteer of “temporal terms” has been mined from WordNet and will be used for further improvements

• Performance is close to state of the art: 0,83% F-Measure over the TEXT attribute (best system: 0,87; average system: 0,72; 3rd rank)

CHRONOS and ONTOTEXT: What we have done (Sept.2004-now)

For details: Gliozzo et al.: Instance Pruning by Filtering Uninformative Words: an Information Extraction Case Study, to appear at CICling 2005

00,10,20,30,40,50,60,70,80,9

1

CU IBMLingPipe

MetaCartaSheffield

Amsterdam

CHRONOS-SVM

Page 65: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

• Short-term: Porting to Italian (ongoing activity)– Rewriting basic rules

• Mid-term: CHRONOS2 (ongoing activity)– Modularization– Integration with NERD

• Long-term: events– Temporal anchoring– Temporal ordering

CHRONOS and ONTOTEXT: Roadmap

Page 66: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

The end

Page 67: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: Chinese Detection

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Broadcast News Newswire

TIMEX2TEXT

F- M

e asu

r ePolytechnic University

Page 68: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TIMEX2: Lexical TriggersPart of Speech Lexical Triggers Non-Triggers

Noun minute, afternoon, midnight, day, weekend, month, summer, season, quarter, era, period, future, past, ...

instant, episode, occasion, timetable, reign, …

Proper name Monday, January, Christmas, etc.

Specialized time patterns

8:00, 12/2/2000, 1994, 1960s, …

Adjective recent, former, current, future, daily, semiannual, biannual, daytime, ago, preseason, …

early, ahead, next, subsequent, frequent, later, contemporary, …

Adverb currently, lately, hourly, daily, monthly, ago, …

earlier, immediately, instantly, meanwhile, next, following, later, soon, eventually, …

Time noun/adverb now, today, tomorrow, …

Number 3, three, third, Sixties, …

<

Page 69: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN: Evaluation Figures

• For each item in the aligned reference/system output:– Corr: the two items are identical

– Inco: the two items are not identical

– Miss: A reference has no system output aligned with it

– Spur: A system output has no reference aligned with it

• Given a set of corr, inco, miss, spur values:– Possible: CORR+INCO+MISS

– Actual: CORR+INCO+SPUR

– Undergeneration: MISS/POS

– Overgeneration: SPUR/ACT

– Substitution: INCO/CORR+INCO

– Error rate: INCO+SPUR+MISS/CORR+INCO+SPUR+MIS

– Precision: CORR/ACT

– Recall: CORR/POSS

– F-measure: 2*P*R/2*P+R

<

Page 70: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Appunti

• Training data (annotated with TIMEX2 tags)– English: 862 files (306K words)– Chinese: 503 files (158K words)

• Evaluation corpus – English: 50K words– Chinese: 50K words

• Humans are always aware of their temporal location (day, month, year) and use context-dependent time expressions (e.g. “today”, “next week”)

Given a temporal expression, the interpretation of its meaning equals to finding its correct position over a timeline

Page 71: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

TERN-2004: English Detection + Normalization

00,10,20,30,40,50,60,70,80,9

1

CLACCymfonyITC-irstLockheedAlicante

Amsterdam

TIMEX2A-DIRA-VALMODSETTEXTVAL

F- M

e asu

r e

6 Participating Sites

<

Page 72: “Mini” Tutorial: Recognition and Normalization of Time Expressions Matteo Negri 1 st ONTOTEXT Project Workshop Trento, 25/11/2004

25/11/2004 Recognition and Normalization of Time Expressions

Temporal Ordering: related issues

•In news, events aren’t usually described in the (narrative) order in which they occur

–Temporal structure dictated by perceived news value•Latest news usually presented first

–News sometimes expresses multiple viewpoints, with commentaries, eyewitness recapitulations, etc.,

•Temporal ordering appears to involve a variety of knowledge sources–Tense & aspect

•Max entered the room. Mary stood up/was seated on the desk.–Temporal adverbials

•Simpson made the call at 3. Later, he was spotted driving towards Westwood.

–Rhetorical relations and World Knowledge•Narration: Max stood up. John greeted him.•Cause/Explanation: Max fell. John pushed him.•Background: Boutros-Ghali Sunday opened a meeting in Nairobi. He arrived in Nairobi from South Africa.

<