TextMap: An Intelligent Question- Answering Assistant Project Members:Abdessamad Echihabi Ulf Hermjakob Eduard Hovy Kevin Knight Daniel Marcu Deepak Ravichandran

TextMap: An Intelligent Question-Answering

Assistant

Project Members: Abdessamad EchihabiUlf HermjakobEduard HovyKevin KnightDaniel MarcuDeepak Ravichandran

Research Foci and Accomplishments• Increase q&a performance on simple, factoid-type questions

– Learned surface text patterns for q&a

– Incorporated pattern-based answering into TextMap

• Develop capability for answering cause/evidence and opinion questions– Learned to recognize causal/evidence relations in arbitrary texts

• Develop capability for answering complex questions– Answer “who is” questions as mini-biographies

• Design q&a interface and system architecture– Multiple q&a engines run in parallel

– Dynamically ranked answers are presented to analysts as soon as they become available

Learning surface text patterns for q&a [Ravichandran and Hovy; ACL-2002]

• Motivation:– Surface text patterns can be used to answer certain factoid

questions (“When was NAME born?”)• <NAME> was born in <BIRTHDATE>

• <NAME> (<BIRTHDATE> --

• Hypothesis:– Surface text patterns can be automatically learned from the

Web

Approach• Start with with a small set/“seed” of known answers to a given

question type – “Gandhi 1869” “Newton 1642”

• Download documents from the web that contain the answers in single sentence

• Use suffix tree to find common sub-strings – “The great composer Mozart (1756-1791) achieved fame at a young

age” – “Mozart (1756-1791) was a genius”– “The whole world would always be indebted to the great music of

Mozart (1756-1791)”

• Replace tags in common sub-strings – <NAME> (<ANSWER> --– <NAME> was born in <ANSWER>

• Discard low-frequency patterns and measure precision

Initial results – TREC10 questions

Answers found in the TREC corpusQuestion type #questions MRR

– Birthyear 8 0.4787– Inventors 6 0.1666– Discoverers 4 0.1250– Definitions 102 0.3445– Why-famous 3 0.6666– Locations 16 0.75

Answers found on the WebQuestion type #questions MRR

– Birthyear 8 0.6875– Inventors 6 0.5833– Discoverers 4 0.8750– Definitions 102 0.3857– Why-famous 3 0.0– Locations 16 0.8643

Future work• Incorporate semantic filters to block wrong

types:– Mozart was born in Salzburg

• Learn to deal with long-distance dependencies and answers of some expected length– <QUESTION> lies on <ANSWER>

– London, which has one of the most busiest airports in the world, lies on the banks of the river Thames

Answering cause/evidence questions• Motivation

– Question: Why did people die in Burundi?• “In Burundi, 179 people died. The flood that hit the

capital was the largest ever recorded.”

– Answer: “because of the flood”

– In order to be able to produce this answer, we need to identify that a cause/evidence relation holds between the sentences above.

– The answer is implicit.

Recognizing discourse relations in texts [Marcu and Echihabi; ACL-2002]

• Such standards would preclude arms sales to states like Libya, which is also currently subject to U.N. embargo.

• ??? states like Rwanda before its present crisis would still be able to legally buy arms.

BUT

Can_buy_arms_legally(Libya)

Can_buy_arms_legally(Rwanda)Similar(Libya, Rwanda)

P(BUT/Contrast | <embargo, legally>) is high

Approach

• Collect a corpus of 1 billion words of English (41M sentences)

• Use simple pattern matching to automatically extract MANY examples of contrast, cause, elaboration, … relations– [BOS … EOS][BOS But … EOS]

– [BOS … ][but … EOS]

– [BOS Although …][, … EOS]

Relation # of examplesContrast 3,881,588Cause-Explanation-Evidence 889,946Condition 1,203,813Elaboration 1,836,227No-Relation-Same-Text 1,000,000No-Relation-Different-Texts 1,000,000

Approach (cont)

Train a simple Bayesian model that explains how the data can be generated

decoderTwo bags of words W1,W2

(sentences, clauses)Most likelyrelation r

argmax P(r) P(W1,W2 | r) r

source

P(relation)

r channel Pairs of words < wi, wj >

P(< wi, wj > | r)

argmax P(r) < wi, wj > W1 x W2 P(< wi, wj > | r) r

Results (In all cases, baseline is 50%)

CEV Cond Elab No-Rel-Same-Text

No-Rel-Diff-Text

Contrast 87 74 82 64 64

Cause-Evidence 76 93 75 74

Condition 89 69 71

Elaboration 76 75

No-Rel-Same-Text 64

Future work• Learn to recognize cause/evidence questions

– Develop typology for cause/evidence q&a types

• Develop algorithms for cause/evidence answer extraction/production

• Incorporate cause/evidence q&a capability into TextMap

Answering complex questions[Hermjakob, Hovy, Ticrea, Cha]

Some complex answers have stereotypical content and structure

• “Mini-bio”– Defined prototypical biography structure– Defined initial typology of biography/person types– Implemented prototype system

• “Natural disasters”– Defined prototypical structure– Defined initial typology of disaster types

Example

• Question: “Who is Clarence Thomas?”

• Old (factoid) answer: “judge”

• New answer: “Clarence Thomas, born 1947/48;

judge for the U.S. Court of Appeals for the District of Columbia;

nominated to the Supreme Court in 1991 by President Bush;

confirmed by the Senate by a narrow 52 to 48 vote.”

Future work• Large-scale tests and refinements of complex

answer types

• Creation of biographies and descriptions of natural disasters– Possibly only partly coherent– Possibly incomplete

Interface and system architecture[Graehl, Knight, and Marcu]

• Three q&a systems run in parallel– IR-based– Surface text pattern-based– Syntax/semantics-based [Webclopedia]

• Answers are presented to the analyst as soon as they become available, as a dynamically ranked list

Future work• Learn to choose between answers produced by

different systems

• Log analyst actions for data mining

• High-performance question-answering system capable of answering

• Complex questions (biographical and event-related)

• Causal questions using rhetorical parsing

• Multilingual QA by robust named-entity translation

TextMap: An Intelligent Question-Answering Assistant

The Novel Ideas

Impact Milestones/Dates/Status

An adaptable, flexible QA system that learns from user interactions

Advanced rhetorical-, semantics-, and statistical-based question understanding, answering, and indexing

Advanced representations of structure of complex multi-part answers

Answers integrated from multiple sourceshttp://www.isi.edu/natural-language/textmap.html

PIs: Daniel Marcu, Eduard Hovy, Kevin Knight, USC/ISI Project COTR: Kellcy Allwein, DIA Date prepared: Dec 2001

Architecture Scheduled Actual– Initial interface JUN 2002 JUN

2002

Question Types– Factoids JUN 2002 JUN

2002– Structured questions DEC 2002– Causal questions JUN 2003

User profiling – Initial profiling DEC 2002– Learning preferences DEC 2003

Named-entity translation DEC 2003

ACQUAINT

Documents

TextMap: An Intelligent Question- Answering Assistant Project Members:Abdessamad Echihabi Ulf Hermjakob Eduard Hovy Kevin Knight Daniel Marcu Deepak Ravichandran