Upload
britton-dean
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
RTE-7@TAC2010The Seventh Recognizing
Textual Entailment Challenge
Luisa Bentivogli (coordinator, CELCT & FBK-irst)Danilo Giampiccolo (coordinator, CELCT)Hoa Trang Dang (NIST)Ido Dagan (Bar Ilan University)Peter Clark (Vulcan Inc.)
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
Textual entailment is a directional relation between two text fragments: •the entailing text, called T(ext) •the entailed text, called H(ypothesis)
Textual Entailment
T entails H if, typically, a human reading T would infer
that H is most likely true
NIST - November 14, 2011 RTE-7@TAC2011
Examples
• YEST: The Christian Science Monitor named a
US journalist kidnapped in Iraq as freelancer Jill Carroll.
H: Jill Carroll was abducted in Iraq.
• NOT: The Christian Science Monitor named a
US journalist kidnapped in Iraq as freelancer Jill Carroll.
H: Jill Carroll is the daughter of Mary Beth Carroll.
NIST - November 14, 2011 RTE-7@TAC2011
The RTE-7 Challenge
Replicates the same tasks as in RTE-6
to allow participants to address the novelties introduced for the first time in RTE-6:
– Main Task: Textual Entailment within a Corpus (Piloted in RTE-5 - Summarization setting)
– Novelty Detection Subtask (based on the Main Task)
– KBP Validation Task (Knowledge Base Population setting)
– Exploratory effort on resource evaluation extended to tools
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 Participants
• Number of participants: 13– RTE-1: 18, RTE-2: 23, RTE-3: 26, RTE-4: 26,
RTE-5: 21, RTE-6: 18
• Provenance– ASIA: 8– EUROPE: 5
• Participants per task– Main Task: 13 (33 runs)– Novelty Detection Subtask: 5 (13 runs)– KBP Validation Pilot Task: 2 (8 runs)
NIST - November 14, 2011 RTE-7@TAC2011
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
• Given – a corpus– a hypothesis H– a set of "candidate" entailing
sentences for that H retrieved by Lucene from the corpus
• RTE systems are required – to identify all the sentences among the
candidate sentences that entail a given Hypothesis
RTE-7 Main Task Description
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 Main Task Example
S1: Betty Friedan, a founder of the modern feminist movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.
S2: She was 85.
S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.
S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.
S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.
S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …
…
S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.
S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.
S3: She said Friedan had been in failing health for some time.
S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.
S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.
S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...
…
S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.
S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..
S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.
S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".
S30: Not homemaking, not motherhood.
S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B.
…
Document 1 Document 2 Document 3
H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.
Hs SET
NIST - November 14, 2011 RTE-7@TAC2011
Topic 918: Betty Friedan
H380: Betty Friedan is the author of "The Feminine Mystique"
RTE-7 Main Task Example
H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.S1: Betty Friedan, a founder of the modern feminist
movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.
S2: She was 85.
S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.
S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.
S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.
S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …
…
S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.
S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.
S3: She said Friedan had been in failing health for some time.
S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.
S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.
S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...
…
S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.
S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..
S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.
S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".
S30: Not homemaking, not motherhood.
S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B.
…
Document 1 Document 2 Document 3
Hs SET
NIST - November 14, 2011 RTE-7@TAC2011
Topic 918: Betty Friedan
H380: Betty Friedan is the author of "The Feminine Mystique"
RTE-7 Main Task Example
H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.S1: Betty Friedan, a founder of the modern feminist
movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.
S2: She was 85.
S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.
S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.
S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.
S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …
…
S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.
S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.
S3: She said Friedan had been in failing health for some time.
S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.
S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.
S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...
…
S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.
S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..
S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.
S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".
S30: Not homemaking, not motherhood.
S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B.
…
Document 1 Document 2 Document 3
Hs SET
NIST - November 14, 2011 RTE-7@TAC2011
Topic 918: Betty Friedan
H380: Betty Friedan is the author of "The Feminine Mystique"
S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women ...
S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died …
S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in her book "The Feminine Mystique," first published...
TAC 2008 and 2009 SUM Update scenarioFor each topic:
RTE-7 Main Data Set (1/2)
Tim
e
Cluster A
Cluster B
Initial Summary
Update Summary
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 Main Data Set (2/2)
S1: Betty Friedan, a founder of the modern feminist movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.
S2: She was 85.
S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.
S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.
S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.
S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …
…
S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.
S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.
S3: She said Friedan had been in failing health for some time.
S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.
S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.
S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...
…
S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.
S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..
S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.
S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".
S30: Not homemaking, not motherhood.
S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B.
…
Document 1 Document 2 Document 3
H380: Betty Friedan is the author of "The Feminine Mystique."H381: Betty Friedan died on February 4, 2006.H382: Betty Friedan died at 85.H397: In The Guardian, Germaine Greer took critical measure of Betty Friedan.
Hs SET
NIST - November 14, 2011 RTE-7@TAC2011
Topic 918: Betty Friedan
20-40 standalone sentences:
- based on the “B” summary sentences of the 3 best scoring SUM systems- based directly on Cluster “A” sentences
Cluster A
Automatic summary sentence:In The Guardian, Germaine Greer took critical measure of a fellow feminist, Betty Friedan, the author of “The Feminine Mystique” who died on Feb. 4 at 85.
RTE-7 Main Data Set (2/2)
H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.S1: Betty Friedan, a founder of the modern feminist
movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.
S2: She was 85.
S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.
S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.
S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.
S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …
…
S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.
S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.
S3: She said Friedan had been in failing health for some time.
S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.
S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.
S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...
…
S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.
S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..
S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.
S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".
S30: Not homemaking, not motherhood.
S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B.
…
Document 1 Document 2 Document 3
Hs SET
NIST - November 14, 2011 RTE-7@TAC2011
Topic 918: Betty Friedan
H380: Betty Friedan is the author of "The Feminine Mystique"
Up to 100 “candidate” entailing sentences- Information Retrieval filtering phase: - The H is the query - The corpus sentences are “the documents” to be retrieved for the query - the 100 top-ranked sentences are selected as candidates (80% of all the entailing sentences in the corpus)
- LUCENE text search engine (v. 2.9.1): - StandardAnalyzer, Boolean “OR” query, Default Lucene ranking
• 3 annotations for the whole data set• IAA (Kappa): 98.35% (Dev), 98.51% (Test)
Data Set Composition
NIST - November 14, 2011 RTE-7@TAC2011
DEVELOPMENT SET TEST SET
Topics 10 Topics 10
HypothesesEntailment: yes |noSummaries: yes |no
284174 | 110
193 | 91
HypothesesEntailment: yes | noSummaries: yes | no
269186 |
83192 |77
Annotations 21,420
Annotations 22,426
“entailment” judg.
1,136 “entailment” judg.
1,308
13 participants (33 runs)
• Evaluation measures: – Precision, Recall, F-measure (micro-
averaged)
• IR Baselines:
Main Task Evaluation
NIST - November 14, 2011 RTE-7@TAC2011
Precision
Recall F1
Lucene_5 37.00 37.84 37.41Lucene_10 27.07 55.20 36.33Lucene_15 21.15 64.65 31.85Lucene_20 17.71 71.64 28.40Lucene_100 5.83 100 11.02
Best Results
NIST - November 14, 2011 RTE-7@TAC2011
TeamPrecisio
nRecall
F-measure
IKOMA1 46.96 49.08 48.00 u_tokyo3 46.84 43.58 45.15 BUPTTeam1 45.02 44.95 44.99 CELI1 41.88 46.56 44.10 DFKI2 50.77 37.92 43.41 BIU2 41.81 44.11 42.93 FBK_irst3 46.59 38.07 41.90Baseline_Lucene5 30.78 39.58 34.63
te_iitb1 20.67 60.24 30.78 JU_CSE_TAC2 26.66 35.55 30.47 ICL1 47.88 21.56 29.73 UAIC20112 30.21 25.84 27.85 SJTU_CIT3 17.92 33.33 23.31 SINAI3 47.3 8.72 14.72Baseline_LuceneAll 4.73 100.00 9.03
Results: F-measure statistics
NIST - November 14, 2011 RTE-7@TAC2011
F-measure Best runsHighest
48.00
Median
41.90
Baseline_Lucene5
37.41
Average
35.95
Lowest
14.72
Results: F-measure statistics
NIST - November 14, 2011 RTE-7@TAC2011
F-measure Best runsHighestRTE-6
48.0048.01
MedianRTE-6
41.9036.14
Baseline_Lucene5RTE-6
37.4134.63
AverageRTE-6
35.9533.70
LowestRTE-6
14.7211.60
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 Novelty Detection SubtaskGoals:
• Specifically address the needs of the SUM Update Task, where it is necessary to distinguish between novel and non novel information
• RTE engines could help summarization systems to filter out non-novel sencences from their summaries
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 Novelty Detection SubtaskTask:
Judge if the information contained in each H (from Cluster B) is novel with respect to the information contained in the set of (Cluster A) candidate entailing sentences
– If a given H:
•has entailing sentences = information is NOT novel
•has not entailing sentences = information is novel
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 Novelty Detection SubtaskBased on the Main Task:• Uses only the Hs taken from the automatic
summaries• Same output format/annotation
– the novelty detection decision is derived automatically from the number of entailing sentences for each H
Differences:• Systems are specifically tuned for novelty
detection • Specific scoring metrics designed for
assessing novelty detectionNIST - November 14, 2011 RTE-7@TAC2011
• IAA (Kappa): 98.21% (Dev), 98.06% (Test)
Data Set Composition
NIST - November 14, 2011 RTE-7@TAC2011
DEVELOPMENT SET TEST SET
Topics 10 Topics 10
HypothesesNovel:
254159
(63%)
HypothesesNovel:
302195
(65%)
“entailing” judgm.
576“entailing” judgm.
779
Evaluation Measures
5 participants (13 runs)
• Primary score: Novelty Detection evaluation– Micro Averaged Precision, recall and F-measure
computed on the binary novel/non-novel decision– derived automatically from the number of entailing
sentences provided by the systems
• Secondary score: Justification evaluation– measures the quality of the justifications provided for
non-novel Hs– Micro-averaged Precision, Recall and F-measure on
the set of all the sentences extracted as entailing the Hs
NIST - November 14, 2011 RTE-7@TAC2011
Best Results – Primary Score
NIST - November 14, 2011 RTE-7@TAC2011
Novelty Detection Evaluation
RunPrecisio
nRecall F1
IKOMA2 86.92 95.38 90.95CELI1 88.83 85.64 87.21JU_CSE_TAC1 80.18 93.33 86.26BIU1 90.74 75.38 82.35DFKI3 91.72 73.85 81.82Baseline_all_new
64.57 100 78.47
Best Results – Secondary Score
NIST - November 14, 2011 RTE-7@TAC2011
Justification Evaluation
RunPrecisi
onRecall
F-measure
BIU3 36.34 40.31 38.22DFKI2 38.36 33.63 35.84IKOMA1 51.84 27.09 35.58CELI1 37.92 33.25 35.43JU_CSE_TAC2 21.94 33.63 26.56
Novelty Detectio
n
Justification
(non novel Hs)
F-measureBest runs
Best runs
Highest
90.95
38.22
Median
86.26
35.58
Average
85.72
34.32
Lowest
81.82
26.56
Results: F-measure statistics
NIST - November 14, 2011 RTE-7@TAC2011
Novelty Detectio
n
Justification
(non novel Hs)
F-measureBest runs
Best runs
Highest RTE-6
90.9582.91
38.2248.26
MedianRTE-6
86.2678.70
35.5835.59
AverageRTE-6
85.7272.41
34.3232.38
LowestRTE-6
81.8243.98
26.563.79
Results: F-measure statistics
NIST - November 14, 2011 RTE-7@TAC2011
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
Knowledge Resources and Tools for RTEAn exploratory effort aimed at studying the
relevance of knowledge resources and tools in recognizing TE
• Ablation Tests for all knowledge resources and tools used in Main Task runs:– remove one module at a time from a
system, and re-run the system on the test set with the other modules, except the one tested
! Remove only knowledge resources or tools! Remove one resource or tool at a time
NIST - November 14, 2011 RTE-7@TAC2011
• 31 ablation tests submitted (by 10 teams)– 7 tests did not specifically address
knowledge resources or tools– 3 tests had a combination of different
resources/components removed
• 21 ablation tests conformant to the requirements– 16 tests for 7 different resources– 5 tests for 2 different tools
Ablation Tests
NIST - November 14, 2011 RTE-7@TAC2011
Ablated Resource#
Ablation Tests
Impact on Systems
Positive Negative
WordNet8 5
(+9.81%)3 (-
0.14%)
Wikipedia3 2
(+8.89%)1 (-
2.64%)
VerbOcean1 1
(+5.93%)-
DIRECT1 1
(+0.94%)-
Paraphrase table1
-1 (-
1.43%)
CatVar1 1
(+0.84%)-
Acronym Lists1
-1 (-
0.16%)
Ablation Tests - Resources
NIST - November 14, 2011 RTE-7@TAC2011
Ablated Resource#
Ablation Tests
Impact on Systems
Positive Negative
WordNet 8 5
(+9.81%)3 (-
0.14%)
Wikipedia3 2
(+8.89%)1 (-
2.64%)
VerbOcean 1 1
(+5.93%)-
DIRECT1 1
(+0.94%)-
Paraphrase table1
-1 (-
1.43%)
CatVar 1 1
(+0.84%)-
Acronym Lists1
-1 (-
0.16%)
Ablation Tests - Resources
NIST - November 14, 2011 RTE-7@TAC2011
Ablated Tool
# Ablati
on Tests
Impact on Systems
Positive Negative
Named Entities Recognizer
42
(+7.97%)2 (-
8.29%)Coreference Resolver
11
(+0.69%)-
Ablation Tests - Tools
NIST - November 14, 2011 RTE-7@TAC2011
Ablated Tool
# Ablati
on Tests
Impact on Systems
Positive Negative
Named Entities Recognizer
42
(+7.97%)2 (-
8.29%)Coreference Resolver
11
(+0.69%)-
Ablation Tests - Tools
NIST - November 14, 2011 RTE-7@TAC2011
• WRT RTE-5 and RTE-6:– Resources: trends confirmed over the years – Tools: RTE-6 trends not confirmed
• Lesson learned– Ablation test results may provide an indication
of the actual contribution of a component to the performance a specific system
– BUT the value of a resource is very much dependent on how that resource is used and how it integrates with the rest of the system
– Need for a deeper comprehension of the usage of the resources and tools
Remarks on the initiative
NIST - November 16, 2010 RTE-6@TAC2010
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
Motivations:
• analyze the potential utility of RTE systems in another real NLP application scenario, i.e. the Knowledge Base Population Slot Filling task
• use Textual Entailment techniques to validate the output of an NLP system (similar to the AVE experiment in QA)
The RTE-7 KBP Validation Task
NIST - November 14, 2011 RTE-7@TAC2011
Given an entity in a knowledge base and an attribute (slot) for that entity:
• find in a large corpus the correct value (filler) for that attribute
• return the extracted information together with a corpus document supporting it as a correct slot filler
The KBP Slot Filling Task
NIST - November 14, 2011 RTE-7@TAC2011
• Initial assumption: an extracted slot filler is correct if and only if the supporting document entails a hypothesis summarizing the slot filler
• Task : determine whether a candidate slot filler is supported in the associated document using entailment techniques.
The RTE-7 KBP Validation Task
NIST - November 14, 2011 RTE-7@TAC2011
Each slot filler returned by KBP systems
Data Set Creation
1 RTE evaluation pair, where:• T is the entire document
supporting the slot filler• H is a set of synonymous
sentences, representing different realizations of the slot filler
NIST - November 14, 2011 RTE-7@TAC2011
Data Set Creation: example
H:
H1: Chris Simcox lives in Tucson, Ariz.H2: Chris Simcox has residence in Tucson, Ariz.H3: Tucson, Ariz. is the place of residence of
Chris SimcoxH4: Chris Simcox resides in Tucson, Ariz.H5: Chris Simcox’s home is in Tucson, Ariz.
Target Entity: Chris SimcoxSlot: ResidencesDocument collection
KBP SYSTEM INPUT Slot Filler: “Tucson, Ariz.”Supporting
Document: NYT_ENG_20050919.0130.LDC2007T07
KBP SYSTEM OUTPUT
T: NYT_ENG_20050919.0130.LDC2007T07
NIST - November 14, 2011
RTE-7@TAC2011
RTE EVALUATION PAIR
H1 CHRIS SIMCOX origins are in CANADIANH2 CHRIS SIMCOX comes from CANADIAN H3 CHRIS SIMCOX is from CANADIAN H4 CHRIS SIMCOX origins are CANADIAN H5 CHRIS SIMCOX has CANADIAN origin H6 CHRIS SIMCOX is of CANADIAN origin
Hypotheses Creation
Manually created templates
Templ 1: X’s origins are in Y
Templ 2: X comes from Y
Templ 3: X is from Y
Templ 4: X origins are Y
Templ 5: X has Y origins
Templ 6: X is of Y origin
Hs
Attribute: origin Target entity: person
Slot filler: Canadian Target entity: Chris Simcox
NIST - November 14, 2011 RTE-7@TAC2011
KBP assessments(automatically)
RTE gold standard annotations
Gold Standard Creation
KBP JUDGMENTS ENTAILMENT VALUES (4-valued) (2-valued)
Correct YESRedundant YESWrong NOInexact (not included)
NIST - November 14, 2011 RTE-7@TAC2011
• RTE evaluation pair– T is an entire document– H is a set of synonymous sentences,
possibly ungrammatical
• (Semi-)automatic generation– Data Set
•from KBP outputs
– Gold Standard•from KBP output assessments
Distinguishing Features
NIST - November 14, 2011 RTE-7@TAC2011
Removed pair types: GPE; “inexact”; “NO_RESPONSE”; duplicates; speech transcriptions; “other_family” slot ; web documents
Data Set Composition
DEVELOPMENT SET TEST SET
Combined RTE-6 Dev and Test sets
KBP ’11 Slot Filling Task assessments
24,014
Pairs24,80
8 Pairs
23,998
Positive examples
2,231 Positive examples
1,508
Negative examples
22,577
Negative examples
21,971
NIST - November 14, 2011 RTE-7@TAC2011
2 TYPES OF SUBMISSIONS:• generic systems (no adaptation)• tailored systems (adapted for specific
slots)
PARTICIPANTS : 2SUBMITTED RUNS: 8• 5 generic• 3 tailored
EVALUATION MEASURES:Micro-Averaged Precision, Recall, F-measure
Evaluation
NIST - November 14, 2011 RTE-7@TAC2011
Baseline: All Ts classified as entailing the corresponding H
This baseline:• reflects the cumulative performance of all KBP Slot Filling Systems• indicates the percentage of entailing pairs in the Test Set
Pilot Task Baseline
NIST - November 14, 2011 RTE-7@TAC2011
Results
TYPE RUN P R F1
GenericJU_CSE_TAC2 11.79 49.14 19.02CELI3 10.47 29.05 15.39
Baseline 6.42 100 12.07Tailored JU_CSE_TAC2 10.97 55.9 18.34
NIST - November 14, 2011 RTE-7@TAC2011
Results
TYPE RUN P R F1 RTE-6
GenericJU_CSE_TAC2 11.79 49.14 19.02 25.5CELI3 10.47 29.05 15.39 15.98
Baseline 6.42 100 12.07 16.13Tailored JU_CSE_TAC2 10.97 55.9 18.34 33.07
NIST - November 14, 2011 RTE-7@TAC2011
- Overall system performance decreased wrt. RTE-6, especially for the “tailored” submission
- All runs are above the baseline
Outline
• The RTE Challenge
• RTE-7 Main Task: RTE within a Corpus
– RTE-7 Novelty Detection Subtask
– Knowledge Resources and Tools for RTE
• RTE-7 KBP Validation Task
• Conclusion and Future Perspectives
NIST - November 14, 2011 RTE-7@TAC2011
RTE-7 repeated RTE-6:
•Main Task: the results largely reflect those achieved in RTE-6 and show an improvement of the overall system performances•Novelty Subtask: the systems performed well and recorded a neat improvement with respect to RTE-6
RTE confirmed a potential to help SUM systems filter out non-novel information
•KBP Validation Task: demonstrated to be the most complex of the tasks proposed
Conclusions
NIST - November 14, 2011 RTE-7@TAC2011
See you all at the RTE Planning Session
Thank you!
Future Directions
NIST - November 14, 2011 RTE-7@TAC2011