Upload
brent-tryon
View
216
Download
2
Embed Size (px)
Citation preview
TextMap: An Intelligent Question-Answering
Assistant
Project Members: Abdessamad EchihabiUlf HermjakobEduard HovyKevin KnightDaniel MarcuDeepak Ravichandran
Research Foci and Accomplishments• Increase q&a performance on simple, factoid-type questions
– Learned surface text patterns for q&a
– Incorporated pattern-based answering into TextMap
• Develop capability for answering cause/evidence and opinion questions– Learned to recognize causal/evidence relations in arbitrary texts
• Develop capability for answering complex questions– Answer “who is” questions as mini-biographies
• Design q&a interface and system architecture– Multiple q&a engines run in parallel
– Dynamically ranked answers are presented to analysts as soon as they become available
Learning surface text patterns for q&a [Ravichandran and Hovy; ACL-2002]
• Motivation:– Surface text patterns can be used to answer certain factoid
questions (“When was NAME born?”)• <NAME> was born in <BIRTHDATE>
• <NAME> (<BIRTHDATE> --
• Hypothesis:– Surface text patterns can be automatically learned from the
Web
Approach• Start with with a small set/“seed” of known answers to a given
question type – “Gandhi 1869” “Newton 1642”
• Download documents from the web that contain the answers in single sentence
• Use suffix tree to find common sub-strings – “The great composer Mozart (1756-1791) achieved fame at a young
age” – “Mozart (1756-1791) was a genius”– “The whole world would always be indebted to the great music of
Mozart (1756-1791)”
• Replace tags in common sub-strings – <NAME> (<ANSWER> --– <NAME> was born in <ANSWER>
• Discard low-frequency patterns and measure precision
Initial results – TREC10 questions
Answers found in the TREC corpusQuestion type #questions MRR
– Birthyear 8 0.4787– Inventors 6 0.1666– Discoverers 4 0.1250– Definitions 102 0.3445– Why-famous 3 0.6666– Locations 16 0.75
Answers found on the WebQuestion type #questions MRR
– Birthyear 8 0.6875– Inventors 6 0.5833– Discoverers 4 0.8750– Definitions 102 0.3857– Why-famous 3 0.0– Locations 16 0.8643
Future work• Incorporate semantic filters to block wrong
types:– Mozart was born in Salzburg
• Learn to deal with long-distance dependencies and answers of some expected length– <QUESTION> lies on <ANSWER>
– London, which has one of the most busiest airports in the world, lies on the banks of the river Thames
Answering cause/evidence questions• Motivation
– Question: Why did people die in Burundi?• “In Burundi, 179 people died. The flood that hit the
capital was the largest ever recorded.”
– Answer: “because of the flood”
– In order to be able to produce this answer, we need to identify that a cause/evidence relation holds between the sentences above.
– The answer is implicit.
Recognizing discourse relations in texts [Marcu and Echihabi; ACL-2002]
• Such standards would preclude arms sales to states like Libya, which is also currently subject to U.N. embargo.
• ??? states like Rwanda before its present crisis would still be able to legally buy arms.
BUT
Can_buy_arms_legally(Libya)
Can_buy_arms_legally(Rwanda)Similar(Libya, Rwanda)
P(BUT/Contrast | <embargo, legally>) is high
Approach
• Collect a corpus of 1 billion words of English (41M sentences)
• Use simple pattern matching to automatically extract MANY examples of contrast, cause, elaboration, … relations– [BOS … EOS][BOS But … EOS]
– [BOS … ][but … EOS]
– [BOS Although …][, … EOS]
Relation # of examplesContrast 3,881,588Cause-Explanation-Evidence 889,946Condition 1,203,813Elaboration 1,836,227No-Relation-Same-Text 1,000,000No-Relation-Different-Texts 1,000,000
Approach (cont)
Train a simple Bayesian model that explains how the data can be generated
decoderTwo bags of words W1,W2
(sentences, clauses)Most likelyrelation r
argmax P(r) P(W1,W2 | r) r
source
P(relation)
r channel Pairs of words < wi, wj >
P(< wi, wj > | r)
argmax P(r) < wi, wj > W1 x W2 P(< wi, wj > | r) r
Results (In all cases, baseline is 50%)
CEV Cond Elab No-Rel-Same-Text
No-Rel-Diff-Text
Contrast 87 74 82 64 64
Cause-Evidence 76 93 75 74
Condition 89 69 71
Elaboration 76 75
No-Rel-Same-Text 64
Future work• Learn to recognize cause/evidence questions
– Develop typology for cause/evidence q&a types
• Develop algorithms for cause/evidence answer extraction/production
• Incorporate cause/evidence q&a capability into TextMap
Answering complex questions[Hermjakob, Hovy, Ticrea, Cha]
Some complex answers have stereotypical content and structure
• “Mini-bio”– Defined prototypical biography structure– Defined initial typology of biography/person types– Implemented prototype system
• “Natural disasters”– Defined prototypical structure– Defined initial typology of disaster types
Example
• Question: “Who is Clarence Thomas?”
• Old (factoid) answer: “judge”
• New answer: “Clarence Thomas, born 1947/48;
judge for the U.S. Court of Appeals for the District of Columbia;
nominated to the Supreme Court in 1991 by President Bush;
confirmed by the Senate by a narrow 52 to 48 vote.”
Future work• Large-scale tests and refinements of complex
answer types
• Creation of biographies and descriptions of natural disasters– Possibly only partly coherent– Possibly incomplete
Interface and system architecture[Graehl, Knight, and Marcu]
• Three q&a systems run in parallel– IR-based– Surface text pattern-based– Syntax/semantics-based [Webclopedia]
• Answers are presented to the analyst as soon as they become available, as a dynamically ranked list
Future work• Learn to choose between answers produced by
different systems
• Log analyst actions for data mining
• High-performance question-answering system capable of answering
• Complex questions (biographical and event-related)
• Causal questions using rhetorical parsing
• Multilingual QA by robust named-entity translation
TextMap: An Intelligent Question-Answering Assistant
The Novel Ideas
Impact Milestones/Dates/Status
An adaptable, flexible QA system that learns from user interactions
Advanced rhetorical-, semantics-, and statistical-based question understanding, answering, and indexing
Advanced representations of structure of complex multi-part answers
Answers integrated from multiple sourceshttp://www.isi.edu/natural-language/textmap.html
PIs: Daniel Marcu, Eduard Hovy, Kevin Knight, USC/ISI Project COTR: Kellcy Allwein, DIA Date prepared: Dec 2001
Architecture Scheduled Actual– Initial interface JUN 2002 JUN
2002
Question Types– Factoids JUN 2002 JUN
2002– Structured questions DEC 2002– Causal questions JUN 2003
User profiling – Initial profiling DEC 2002– Learning preferences DEC 2003
Named-entity translation DEC 2003
ACQUAINT