Natural Language Question Answering

Natural Language Question Answering

Gary Geunbae LeeNLP Lab.

Professor@POSTECHChief Scientist@DiQuest

Kiss_tutorial 2

Dan Jurafsky

2

Question AnsweringOne of the oldest NLP tasks (punched card systems in 1961)

Simmons, Klein, McConlogue. 1964. Indexing and Dependency Logic for Answering English Questions. American Documentation 15:30, 196-204

Dan Jurafsky

Question Answering: IBM’s Watson

• Won Jeopardy on February 16, 2011!

3

WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF

WALLACHIA AND MOLDOVIA”INSPIRED THIS AUTHOR’S

MOST FAMOUS NOVEL

Bram Stoker

Dan Jurafsky

Apple’s Siri

4

Dan Jurafsky

Wolfram Alpha

5

Gary G. Lee, Postech

6

Task Definition Open-domain QA

Find answers to open-domain natural language questions from large collections of documents

Questions are usually about factoid Documents includes texts, web pages, databases, multimedia, maps,

etc. Canned QA

Map new questions into predefined questions for which answers exist.

FAQ Finder


7

Example (1) Q: What is the fastest car in the world? Correct answer:

…, the Jaguar XJ220 is the dearest (Pounds 415,000), fastest (217 mph / 350 kmh) and most sought-after car in the world.

Wrong answer:… will stretch Volkswagen’s lead in the world’s fastest-growing vehicle

market. Demand is expected to soar in the next few years as more Chinese are able to afford their own cars.


8

Example (2) Q: How did Socrates die?

Answer: … Socrates drunk poisoned wine.

Needs world knowledge and reasoning.(Anyone drinking or eating something that is poisoned is likely to die.)


9

A taxonomy of QA systems

Class Type Example

1 Factual Q: What is the largest city in Germany?A: … Berlin, the largest city in Germany, …

2 Simple reasoning Q: How did Socrates die?A: … Socrates poisoned himself …

3 Fusion list Q: What are the arguments for and against prayer in school?Answer across several texts

4 Interactive context Clarification questions

5 Speculative Q: Should the Fed raise interest rates at their next meeting?Answer provides analogies to past actions

Moldovan et al., ACL 2002


10

Enabling Technologies and Applications

Enabling Technologies POS Tagger Parser WSD Named Entity Tagger Information Retrieval Information Extraction Inference engine Ontology (WordNet) Knowledge Acquisition Knowledge Classification Language generation

Applications Smart agents Situation Management E-Commerce Summarization Tutoring / Learning Personal Assistant in business On-line Documentation On-line Troubleshooting Semantic Web


11

Question Taxonomies Wendy Lehnert’s Taxonomy

13 question categories Based on Shank’s Conceptual Dependency

Arthur Graesser’s Taxonomy 18 question categories Expanding Lehnert’s categories Based on speech act theory

Question Types in LASSO [Moldovan et al., TREC-8] Combines question stems, question focus and phrasal heads. Unifies the question class with the answer type via the question

focus.


12

Earlier QA Systems The QUALM system (Wendy Lehnert, 1978)

Reads stories and answers questions about what was read The LUNAR system (W. Woods, 1977)

One of the first user evaluations of question answering systems The STUDENT system (Terry Winograd, 1977)

Solved high school algebra problems The MURAX system (Julian Kupiec, 1993)

Uses online encyclopedia for closed-class questions The START system (Boris Katz, 1997)

Uses annotations to process questions from the Web FAQ-Finder (Robin Burke, 1997)

Uses Frequently Asked Questions from Usenet newsgroups


13

Recent QA ConferencesTREC-8 TREC-9 TREC-10 TREC-11 NTCIR-3

No. Questions 200 693 500 500 200

Collection size 2GB 3GB 3GB 3GB (AQUAINT) 280MB

Question source From text Query logs Query logs Query logs From text

Participating systems 41 75 92 76 36

Performance 64.5% 76% 69% 85.6% 60.8%

Answer format 50&250 bytes 50&250 bytes 50 bytes Exact Exact

Top 5, MRR Top 5, MRR

Top 5, MRRMain taskList taskContext taskNull answer

Top 1, CWMain taskList taskNull answer

Top 5, MRRTask 1Task 2Task 3

Dan Jurafsky

14

Types of Questions in Modern Systems

• Factoid questions• Who wrote “The Universal Declaration of Human Rights”?• How many calories are there in two slices of apple pie?• What is the average age of the onset of autism?• Where is Apple Computer based?

• Complex (narrative) questions:• In children with an acute febrile illness, what is the

efficacy of acetaminophen in reducing fever?• What do scholars think about Jefferson’s position on

dealing with pirates?

Dan Jurafsky

Commercial systems: mainly factoid questions

Where is the Louvre Museum located?

In Paris, France

What’s the abbreviation for limited partnership?

L.P.

What are the names of Odin’s ravens?

Huginn and Muninn

What currency is used in China? The yuan

What kind of nuts are used in marzipan?

almonds

What instrument does Max Roach play?

drums

What is the telephone number for Stanford University?

650-723-2300

Dan Jurafsky

Paradigms for QA

• IR-based approaches• TREC; IBM Watson; Google

• Knowledge-based and Hybrid approaches• IBM Watson; Apple Siri; Wolfram Alpha; True

Knowledge Evi

16

Dan Jurafsky

Many questions can already be answered by web search

• a

17

Dan Jurafsky

IR-based Question Answering

• a

18

Dan Jurafsky

19

IR-based Factoid QA

Dan Jurafsky

IR-based Factoid QA

• QUESTION PROCESSING• Detect question type, answer type, focus, relations• Formulate queries to send to a search engine

• PASSAGE RETRIEVAL• Retrieve ranked documents• Break into suitable passages and rerank

• ANSWER PROCESSING• Extract candidate answers• Rank candidates

• using evidence from the text and external sources

Dan Jurafsky

Knowledge-based approaches (Siri)

• Build a semantic representation of the query• Times, dates, locations, entities, numeric quantities

• Map from this semantics to query structured data or resources• Geospatial databases• Ontologies (Wikipedia infoboxes, dbPedia, WordNet, Yago)• Restaurant review sources and reservation services• Scientific databases

21

Dan Jurafsky

Hybrid approaches (IBM Watson)

• Build a shallow semantic representation of the query• Generate answer candidates using IR methods

• Augmented with ontologies and semi-structured data• Score each candidate using richer knowledge sources

• Geospatial databases• Temporal reasoning• Taxonomical classification

22

Dan Jurafsky

Factoid Q/A

23

Dan Jurafsky

Question ProcessingThings to extract from the question

• Answer Type Detection• Decide the named entity type (person, place) of the answer

• Query Formulation• Choose query keywords for the IR system

• Question Type classification• Is this a definition question, a math question, a list question?

• Focus Detection• Find the question words that are replaced by the answer

• Relation Extraction• Find relations between entities in the question

24

Dan Jurafsky

Question Processing They’re the two states you could be reentering if you’re crossing Florida’s northern border

• Answer Type: US state• Query: two states, border, Florida, north• Focus: the two states• Relations: borders(Florida, ?x, north)

25

Dan Jurafsky

Answer Type Detection: Named Entities

• Who founded Virgin Airlines?• PERSON

• What Canadian city has the largest population?• CITY.

Dan Jurafsky

Answer Type Taxonomy

• 6 coarse classes• ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION,

NUMERIC• 50 finer classes

• LOCATION: city, country, mountain…• HUMAN: group, individual, title, description• ENTITY: animal, body, color, currency…

27

Xin Li, Dan Roth. 2002. Learning Question Classifiers. COLING'02

Dan Jurafsky

28

Part of Li & Roth’s Answer Type Taxonomy

Dan Jurafsky

29

Answer Types

Dan Jurafsky

30

More Answer Types

Dan Jurafsky

Answer types in Jeopardy

• 2500 answer types in 20,000 Jeopardy question sample• The most frequent 200 answer types cover < 50% of data• The 40 most frequent Jeopardy answer typeshe, country, city, man, film, state, she, author, group, here, company, president, capital, star, novel, character, woman, river, island, king, song, part, series, sport, singer, actor, play, team, show, actress, animal, presidential, composer, musical, nation, book, title, leader, game

31

Ferrucci et al. 2010. Building Watson: An Overview of the DeepQA Project. AI Magazine. Fall 2010. 59-79.

Dan Jurafsky

Answer Type Detection

• Hand-written rules• Machine Learning• Hybrids

Dan Jurafsky


• Regular expression-based rules can get some cases:• Who {is|was|are|were} PERSON• PERSON (YEAR – YEAR)

• Other rules use the question headword: (the headword of the first noun phrase after the wh-word)

• Which city in China has the largest number of foreign financial companies?

• What is the state flower of California?

Dan Jurafsky


• Most often, we treat the problem as machine learning classification • Define a taxonomy of question types• Annotate training data for each question type• Train classifiers for each question class

using a rich set of features.• features include those hand-written rules!

34

Dan Jurafsky

Features for Answer Type Detection

• Question words and phrases• Part-of-speech tags• Parse features (headwords)• Named Entities• Semantically related words

35

Dan Jurafsky

Factoid Q/A

36

Dan Jurafsky

Keyword Selection Algorithm

1. Select all non-stop words in quotations2. Select all NNP words in recognized named entities3. Select all complex nominals with their adjectival modifiers4. Select all other complex nominals5. Select all nouns with their adjectival modifiers6. Select all other nouns7. Select all verbs 8. Select all adverbs 9. Select the QFW word (skipped in all previous steps) 10. Select all other words

Dan Moldovan, Sanda Harabagiu, Marius Paca, Rada Mihalcea, Richard Goodrum, Roxana Girju and Vasile Rus. 1999. Proceedings of TREC-8.

Dan Jurafsky

Choosing keywords from the query

38

Who coined the term “cyberspace” in his novel “Neuromancer”?

1 1

4 4

7

cyberspace/1 Neuromancer/1 term/4 novel/4 coined/7

Slide from Mihai Surdeanu

Dan Jurafsky

Factoid Q/A

39

Dan Jurafsky

40

Passage Retrieval

• Step 1: IR engine retrieves documents using query terms• Step 2: Segment the documents into shorter units

• something like paragraphs• Step 3: Passage ranking

• Use answer type to help rerank passages

Dan Jurafsky

Features for Passage Ranking

• Number of Named Entities of the right type in passage• Number of query words in passage• Number of question N-grams also in passage• Proximity of query keywords to each other in passage• Longest sequence of question words• Rank of the document containing passage

Either in rule-based classifiers or with supervised machine learning

Dan Jurafsky

Factoid Q/A

42

Dan Jurafsky

Answer Extraction

• Run an answer-type named-entity tagger on the passages• Each answer type requires a named-entity tagger that detects it• If answer type is CITY, tagger has to tag CITY

• Can be full NER, simple regular expressions, or hybrid• Return the string with the right type:

• Who is the prime minister of India (PERSON)Manmohan Singh, Prime Minister of India, had told left leaders that the deal would not be renegotiated.• How tall is Mt. Everest? (LENGTH)The official height of Mount Everest is 29035 feet

Dan Jurafsky

Ranking Candidate Answers

• But what if there are multiple candidate answers!

Q: Who was Queen Victoria’s second son?• Answer Type: Person

• Passage:The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert

Dan Jurafsky

Ranking Candidate Answers

• But what if there are multiple candidate answers!

Q: Who was Queen Victoria’s second son?• Answer Type: Person

• Passage:The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert

Dan Jurafsky

Use machine learning:Features for ranking candidate answers

Answer type match: Candidate contains a phrase with the correct answer type.Pattern match: Regular expression pattern matches the candidate.Question keywords: # of question keywords in the candidate.Keyword distance: Distance in words between the candidate and query keywords Novelty factor: A word in the candidate is not in the query.Apposition features: The candidate is an appositive to question termsPunctuation location: The candidate is immediately followed by a comma, period, quotation marks, semicolon, or exclamation mark.Sequences of question terms: The length of the longest sequence of question terms that occurs in the candidate answer.

Dan Jurafsky

Candidate Answer scoring in IBM Watson

• Each candidate answer gets scores from >50 components• (from unstructured text, semi-structured text, triple stores)

• logical form (parse) match between question and candidate• passage source reliability • geospatial location

• California is ”southwest of Montana”• temporal relationships• taxonomic classification

47

Dan Jurafsky

48

Common Evaluation Metrics

1. Accuracy (does answer match gold-labeled answer?)2. Mean Reciprocal Rank

• For each query return a ranked list of M candidate answers.• Its score is 1/Rank of the first right answer.• Take the mean over all N queries


49

A Generic QA Architecture

QAQuestion Answers

DocumentCollection

Question Processing

Document Retrieval

Answer Ex-traction &

Formulation

Resources (dictionary, ontology, …)

Query Documents


50

IBM - Statistical QA

31 answer categories Named Entity tagging Focus expansion using WordNet Query expansion using encyclopedia Dependency Relations matching Use a maximum entropy model in both Answer Type Prediction and

Answer Selection

Ittycheriah et al. TREC-9, TREC-10

Query & Focus Ex-pansion

Answer Type Prediction

IR Answer Se-lection

Question

Docs

Answers

Answer type Named Entity Marking


51

USC/ISI - Webclopedia

Parsing questions and top segments CONTEX

Query expansion using WordNet Score each sentence using word scores Match constraint patterns & parse trees Match semantic type & parse tree elements

Hovy et al. TREC-9, TREC-10

ParsingIR Matching

Question Docs

Answers

Create Query

Select & rank sentences

Parsing

Inference

Ranking

QAtypology

Constraintpatterns


52

LCC - PowerAnswerMoldovan et al. ACL-02

M1: Keyword pre-processing

(split/bind/spell)

Question

Docs

M2: Construction of question repre-

sentation

M3: Derivation of expected an-

swer type

M4: Keyword selection

M5: Key-word expan-

sion

M6: Actual retrieval of documents

and passages

M7: Passage post-filtering

M8: Identifi-cation of

candidate an-swers

M9: An-swer

ranking

M5: Answer formulation

M8’: Logic

Proving

Answers

Loop1 Loop2Loop3


53

LCC – PowerAnswer (cont’d) Keyword alternations

Morphological alternations invent inventor

Lexical alternations How far distance

Semantic alternations like prefer

3 feedback loops Loop1: adjust Boolean queries Loop2: lexical alternations Loop3: semantic alternations

Logic proving Unification between the question and answer logic forms

Dan Jurafsky

Relation Extraction

• Answers: Databases of Relations• born-in(“Emma Goldman”, “June 27 1869”)• author-of(“Cao Xue Qin”, “Dream of the Red Chamber”)• Draw from Wikipedia infoboxes, DBpedia, FreeBase, etc.

• Questions: Extracting Relations in QuestionsWhose granddaughter starred in E.T.?

(acted-in ?x “E.T.”) (granddaughter-of ?x ?y)

54

Dan Jurafsky

Temporal Reasoning

• Relation databases• (and obituaries, biographical dictionaries, etc.)

• IBM Watson”In 1594 he took a job as a tax collector in Andalusia”Candidates:

• Thoreau is a bad answer (born in 1817)• Cervantes is possible (was alive in 1594)

55

Dan Jurafsky

Geospatial knowledge(containment, directionality, borders)

• Beijing is a good answer for ”Asian city”• California is ”southwest of Montana”• geonames.org:

56

Dan Jurafsky

Context and Conversation in Virtual Assistants like Siri

• Coreference helps resolve ambiguitiesU: “Book a table at Il Fornaio at 7:00 with my mom”U: “Also send her an email reminder”

• Clarification questions:U: “Chicago pizza”S: “Did you mean pizza restaurants in Chicago or Chicago-style pizza?”

57

Dan Jurafsky

Answering harder questionsQ: What is water spinach?A: Water spinach (ipomoea aquatica) is a semi-aquatic leafy green plant with long hollow stems and spear- or heart-shaped leaves, widely grown throughout Asia as a leaf vegetable. The leaves and stems are often eaten stir-fried flavored with salt or in soups. Other common names include morning glory vegetable, kangkong (Malay), rau muong (Viet.), ong choi (Cant.), and kong xin cai (Mand.). It is not related to spinach, but is closely related to sweet potato and convolvulus.

Dan Jurafsky

Answering harder question

Q: In children with an acute febrile illness, what is the efficacy of single medication therapy with acetaminophen or ibuprofen in reducing fever?A: Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. (PubMedID: 1621668, Evidence Strength: A)

Dan Jurafsky

Answering harder questions via query-focused summarization

• The (bottom-up) snippet method• Find a set of relevant documents• Extract informative sentences from the documents (using tf-idf, MMR)• Order and modify the sentences into an answer

• The (top-down) information extraction method• build specific answerers for different question types:

• definition questions,• biography questions, • certain medical questions

Dan Jurafsky

The Information Extraction method

• a good biography of a person contains:• a person’s birth/death, fame factor, education, nationality and so on

• a good definition contains:• genus or hypernym

• The Hajj is a type of ritual• a medical answer about a drug’s use contains:

• the problem (the medical condition), • the intervention (the drug or procedure), and • the outcome (the result of the study).

Dan Jurafsky

Information that should be in the answer for 3 kinds of questions

Dan Jurafsky

Architecture for complex question answering: definition questions

S. Blair-Goldensohn, K. McKeown and A. Schlaikjer. 2004. Answering Definition Questions: A Hyrbid Approach.


64

Postech qa: Answer Type

Top 18, Total 84

DATE TIME PROPERTY LANGUAGE_UNIT LANGUAGE SYMBOLIC_REP ACTION ACTIVITY LIFE_FORM

AGE DISTANCE DURATION LENGTH MONETARY_VALUE NUMBER PERCENTAGE POWER RATE SIZE SPEED TEMP

ERATURE VOLUME WEIGHT

QUANTITY NATURAL_OBJECT LOCATION SUBSTANCE ARTIFACT GROUP PHENOMENON STATUS BODY_PART

…………

SiteQ system


65

Layout of SiteQ/JPart-Of-Speech Tagging

(POSTAG/J)Document Retrieval

(POSNIR/J)

Passage Selection- divide document into dynamic passages- score passages

Chunking &Category Processing- NP,VP chunking- syntactic/semantic category check

Detection of Answer Candidates- LSP matching

Question

Answer

SemanticCategory

Dictionary

LSP grammarfor question

LSP grammarfor answer

IndexDB

Identification ofAnswer Type- locate question word- LSP matching- date constraint check

Answer Filtering& Scoring& Ranking

SiteQ system


66

Question Processing

question

Tagger

NP Chunker

Answer TypeDeterminerQuery

FormatterNormalizerRE Matcher

Qnorm dic

LSP

query answer type

SiteQ system


67

Lexico-Semantic Patterns LSP is a pattern expressed by lexical entries, POS, syntactic

categories and semantic categories and more flexible than surface pattern

Used for Identification of answer type of a question Detection of answer candidates from selected passages

SiteQ system


68

Answer Type Determination (1)

Who was the 23rd president of the United States?

Who/WP was/VBD the/DT 23rd/JJ president/NN of/IN the/DT United/NP States/NPS ?/SENT ?

%who%be@person

(%who)(%be)(@person) PERSON

Tagger

NP Chunker

Normalizer

RE Matcher

Who/WP was/VBD [[ the/DT 23rd/JJ president/NN ] of/IN [ the/DT United/NP States/NPS ]] ?/SENT ?

SiteQ system


69

Dynamic Passage Selection (1) Analysis of answer passages of TREC-10 questions

About 80% of query terms occurred within 50-word (or 3-sentence) window including answer string in its center

Definition of passage Prefer sentence window to word window Prefer dynamic passage to static passage Passage is defined as any consecutive three sentences

SiteQ system


70

Term list

Answer Extraction

Pivot cre-ation

Query-term detec-tion

Noun-phrase chunk-ing

Answer detection core

Answer detec-tion

Query-term filter-ing

Answer scor-ingAnswer filter-

ing

AAD process-ing

Stemming (Porter’s) WordNet

Stop-Word

SiteQ system


71

Answer Detection (1)

… The/DT velocity/NN of/IN light/NN is/VBZ 300,000/CD km/NN per/IN second/JJ in/IN the/DT physics/NN

textbooks/NNS …

cd@unit_length%per@unit_time speed|1|4|4

SiteQ system


72

Answer Scoring Based on

Passage score Number of unique terms matched in the passage Distances from each matched term

)()1(

avgdistptucptuc

qtucptucLSPwgt

PScoreAScore

LSPwgt: the weight of LSP grammaravgdist: the average distance between the answer candidate and each matched term

SiteQ system


73

Performance in TREC-10

MRR MRR(strict) (lenient)

how + adj/adv 31 0.316 0.332how do 2 0.250 0.250what do 24 0.050 0.050what is 242 0.308 0.320what/which noun 88 0.289 0.331when 26 0.362 0.362where 27 0.515 0.515who 46 0.464 0.471why 4 0.125 0.125name a 2 0.500 0.500Total 492

Q-type freqRank # of Qs

(strict)# of Qs

(le-nient)

Avg. of 67 runs

1 121 124 88.582 45 49 28.243 24 29 20.464 15 16 12.575 11 14 12.46

No 276 260 329.7MRR 0.320 0.335 0.234

SiteQ system


74

Performance in QAC-1 Performance of question

answering Return 994 answer candidates

among which 183 responses were judged as correct

MRR is 0.608, the best performance (average is 0.303)

Question Answer Output Correct

200 288 994 183

Recall Precision F-measure MRR

0.635 0.184 0.285 0.608

Rank # of questions

1 98

2 27

3 14

4 6

5 4

Total 149

SiteQ system


75

Contents Introduction: QA as a high precision search Architecture of QA Systems Main components for QA A QA system: SiteQ

Issues for advanced QA


76

Question Interpretation Use pragmatic knowledge

“Will Prime Minister Mori survive the political crisis?” The analyst is not literary meaningWill Prime Minister Mori be still alive when the political crisis is over? But expresses the belief thatThe current political crisis might cost the Japanese Prime Minister his

job. Decompose complex questions into series of simpler

questions Expand the question with contextual information


77

Semantic Parsing Use the explicit representation of semantic roles as encoded

in FrameNet FrameNet provides database of words with associated semantic

roles Enables efficient, computable semantic representation of

questions and texts. Filling in semantic roles

Prominent role for named entities Generalization across domains FrameNet coverage and fallback techniques


78

Information Fusion Combining answer fragments in a concise response Technologies

Learning paraphrases, syntactic and lexical Detecting important differences Merging descriptions Learning and formulating content plans


79

Answer Generation Compare request fills from several answer extractions to

derive answer Merge / remove duplicates Detect ambiguity Reconcile answer (secondary search) Identify possible contradictions Apply utility measures


80

And more … Real Time QA Multi-Lingual QA Interactive QA Advanced Reasoning for QA User Profiling for QA Collaborative QA Spoken language QA