1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario

1

I256

Applied Natural Language Processing

Fall 2009

Question answering

Barbara Rosario

2

QA: Outline

• Introduction• Factoid QA • Three stages of a typical QA system

– Question processing– Passage retrieval– Answer processing

• Evaluation of factoids answers• Complex Questions

• Acknowledgments– Speech and language processing, Jurafsky and Martin (chapter 23)– Some slides adapted from Manning, Harabagiu, Kusmerick, ISI

3

The Problem of Question Answering (QA)

What is the nationality of Pope John Paul II?

… stabilize the country with its help, the Catholic hierarchy stoutly held out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…

Natural language question, not keyword queries

Short text fragment, not URL list Answer: Polish

4

People want to ask questions…Examples from AltaVista query log

who invented surf music?how to make stink bombswhere are the snowdens of yesteryear?which english translation of the bible is used in official catholic liturgies?how to do clayarthow to copy psxhow tall is the sears tower?

Examples from Excite query loghow can i find someone in texaswhere can i find information on puritan religion?what are the 7 wonders of the worldhow can i eliminate stressWhat vacuum cleaner does Consumers Guide recommend

5

A spectrum of question types

• What is the typical height of a giraffe? • Where is Apple based?

Factoids(Question/Answer)

Browse and Build• What are some good ideas for landscaping my

client’s yard?

Complex questions

(QA, Text Data Mining)

• What are some promising untried treatments for Raynaud’s disease?

6

Factoid QA

• Factoid QA if the information required is a simple fact– Examples:

• Where is the Louvre Museum located?• What currency is used in China?• What is the official language of Algeria?

• Fundamental problem: the gap between the way questions are posed and the way answers are expressed in the text

• User questions:– What company sells the most greeting cards?

• Potential document answer– Hallmark remains the largest market of greeting cards

• Need to process both questions and answers and them “match” them

7

8

Is this good?What is the problem?

9

Typical Structure of a QA-System

Question1) Query

Processing

Query formulation

Query classification

2) Passage retrieval

3) Answerprocessing

Answer

Three stages:

1. Question processing

2. Passage retrieval

3. Answer processing

Corpusor

Web

IR

AnswerType

Query

10

1) Question processing

• Goal: given a natural language question, extract:

1. Keyword query suitable as input to a IR system– Query formulation

2. Answer type (specification of the kind of entity that would constitute a reasonable answer to the question) – Question classification

11

1) Question processing: Query formulation

• Extract lexical terms (keywords) from the question– possibly expanded with lexical/semantic

variations (especially for smaller set of documents)

12

Lexical Terms Extraction• Questions approximated by sets of unrelated

words (lexical terms)

• Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms

Q002: What was the monetary value of the Nobel Peace Prize in 1989?

monetary, value, Nobel, Peace, Prize

Q003: What does the Peugeot company manufacture?

Peugeot, company, manufacture

Q004: How much did Mercury spend on advertising in 1993?

Mercury, spend, advertising, 1993

Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

13

Keyword Selection Examples

• What researcher discovered the vaccine against Hepatitis-B?– Hepatitis-B, vaccine, discover, researcher

• What is the name of the French oceanographer who owned Calypso?– Calypso, French, own, oceanographer

• What U.S. government agency registers trademarks?– U.S., government, trademarks, register, agency

• What is the capital of Kosovo?– Kosovo, capital

14

1) Question processing: Query reformulation

• Keyword Selection• Query reformulation • Apply set of query reformulation rules to the

query– To make it look like a substring of possible declarative

answers– “when was the laser invented” “laser was invented”– Send reformulation to search engine– Rules examples: (Lin 07)

• wh-word did A verb B A verb-ed B• Where is A A is located in

15

1) Question processing: Query Classification

• Classify the question by its expected answer type

• Important both at the retrieval phase and answer presentation phase– “who is Zhou Enlai” may use biographic-

specific template

16

Query Classification Question Stems and Answer Types

Question Question stem Answer type

Q555: What was the name of Titanic’s captain?

What Person

Q654: What U.S. Government agency registers trademarks?

What Organization

Q162: What is the capital of Kosovo? What City

Q661: How much does one ton of cement cost?

How much Quantity

• Other question stems: Who, Which, Name, How hot...• Other answer types: Country, Number, Product...

17

Detecting the Expected Answer Type

• In some cases, the question stem is sufficient to indicate the answer type (AT)– Why REASON– When DATE

• In many cases, the question stem is ambiguous– Examples

• What was the name of Titanic’s captain ?• What U.S. Government agency registers trademarks?• What is the capital of Kosovo?

– Solution: select additional question concepts (AT words) that help disambiguate the expected answer type

• Captain, agency, capital/city

18

Answer Type Taxonomy

• Rich set of AT, often hierarchical

• Hierarchical AT taxonomies can be built by hand or dynamically from WordNet

19

Answer Type Taxonomy• Encodes 8707 English concepts to help recognize

expected answer type• Mapping to parts of Wordnet done by hand

20

Answer Type Detection Algorithms• AT detection accuracy is high on easy AT such as PERSON,

LOCATION, TIME• Detecting REASON and DESCRIPTION questions can be much

harder• The derivation of the answer type is the main source of

unrecoverable errors in the QA system

• Hand-written rules– Webclopedia QA typology contains 276 rules with 180 answer types

• Supervised machine learning (classification)– Typical features include the words, POS, named entities, headwords

(words that give extra information: headwords of the first NP after the wh-word; “which is the state flower of California?”)

• Using WordNet and AT taxonomy

21

Answer Type Detection Algorithmwith AT taxonomy

• Map the AT word in a previously built AT hierarchy– The AT hierarchy is based on WordNet, with some

concepts associated with semantic categories, e.g. “writer” PERSON.

• Select the AT(s) from the first hypernym(s) associated with a semantic category.

22

researcheroceanographer

chemist

scientist,man of science

Americanislander,

island-dweller

westerner

inhabitant,dweller, denizen

actor

actress

dancer

performer,performing artist

balletdancertragedian

ERSONP

Whatresearcher

discovered

Hepatitis-B vaccine

What researcher discovered thevaccine against Hepatitis-B?

What is the name of the French oceanographer who owned Calypso?

PERSON What oceanographer

ownedCalypso

name

French

PERSON

Answer Type Detection Algorithmwith AT taxonomy

23

QA stages

1. Question processing• Query formulation• Question classification

2. Passage retrieval


24

2) Passage retrieval

• IR system returns set of documents• Passage can be sentence, paragraph section• Passage retrieval: extract set of potential answer

passages from retrieved documents by: 1. Filtering out passages that don’t contain potential

answers to the question• Named entity recognition or answer type classification

2. Ranking the rest accordingly on how likely they are to contain the answer

• Hand-built rules• Machine learning

25

2) Passage retrieval(ranking)

• Most common features for passage ranking– Number of named entities of the right type in passage– Number of question keywords in passage– The longest exact sequence of question keywords that

occurs in the passage– The rank of the document from which the passage was

extracted– The proximity of the keywords from the original query

from each other (to prefer smaller spans that include more keywords)

– The N-gram overlap between the passage and the question (to prefer passages with higher N-gram overlap with question

26

2) Passage retrieval• For QA from the Web we may skip the step of passage

retrieval, by relying on the snippets produced by the web searches

– Ex: when was movable type metal printing invented in Korea?

27

QA stages


2. Passage retrieval• Filter out passages• Rank them


28

3) Answer processing

• Extract a specific answer from the passage

• Two main classes of algorithms

1. Answer-type pattern extraction

2. N-gram tiling

29

3) Answer processing: pattern extraction

• Use information about the expected AT together with regular expression

– If AT is HUMAN, extract named entities HUMAN from passage

• For some AT (DEFINITION, for example) don’t have a particular named entity type

– Regex patterns (by hand or learnt automatically)

Pattern Question Answer

<AP> such as <QP> What is autism? “, developmental disorders such as

autism”

30

Answer processing: N-gram tiling AskMSR System Architecture

1 2

3

45

31

Step 3: Gathering N-Grams

• Enumerate all N-grams (N=1,2,3) in all retrieved snippets• Weight of an n-gram: occurrence count, each weighted by

“reliability” (weight) of rewrite rule that fetched the document– Example: “Who created the character of Scrooge?”

Dickens 117Christmas Carol 78Charles Dickens 75Disney 72Carl Banks 54A Christmas 41Christmas Carol 45Uncle 31

32

Step 4: Filtering N-Grams

• N-gram are scored by how well they match the predicted answer type– Boost score of n-grams that match regexp– Lower score of n-grams that don’t match regexp

33

Step 5: Tiling the Answers

Dickens

Charles Dickens

Mr Charles

Scores

20

15

10

merged, discardold n-grams

Mr Charles DickensScore 45

Concatenate overlaping N-grams fragment into longer answers

34

QA stages



3. Answer processing• Answer-type pattern extraction• N-gram tiling

• Evaluation of factoids answers

35

Evaluation of factoids answers

• Variety of tchniques have been proposed

• Most influential evaluation framework: TRAC (Text Retrieval Conference) QA track– http://trec.nist.gov/

36

Question Answering at TREC• Question answering competition at TREC consists of answering

a set of 500 fact-based questions, e.g., – “When was Mozart born?”.

• Has really pushed the field forward.• The document set

– Newswire textual documents from LA Times, San Jose Mercury News, Wall Street Journal, NY Times etcetera: over 1M documents now.

– Well-formed lexically, syntactically and semantically (were reviewed by professional editors).

• The questions– Hundreds of new questions every year, the total is ~2400

• Task– Extract only one exact answer.– Several other sub-tasks added later: definition, list,

biography.

37

Sample TREC questions1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"?

2. What was the monetary value of the Nobel Peace Prize in 1989?

3. What does the Peugeot company manufacture?

4. How much did Mercury spend on advertising in 1993?

5. What is the name of the managing director of Apricot Computer?

6. Why did David Koresh ask the FBI for a word processor?

7. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?

38

TREC Scoring

• Systems return 5 ranked answer snippets to each question.– Mean Reciprocal Rank Scoring (MRR):

• Each question assigned the reciprocal rank of the first correct answer. If correct answer at position k, the score is 1/k.

1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ position

– Score of a system is the average of the score for each question

– System returning ranked answers for a test set of N question:

N

rankMRR

N

ii

1

1

39

Top Performing Systems• In 2003, the best performing systems at TREC can answer approximately 60-70% of the

questions

• Approaches and successes have varied a fair deal– Knowledge-rich approaches, using a vast array of NLP techniques stole the show in 2000-

2003• Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC )

• Statistical systems starting to catch up– AskMSR system stressed how much could be achieved by very simple methods with

enough text – People are experimenting with machine learning methods

40

QA stages



3. Answer processing• Answer-type pattern extraction• N-gram tiling

• Evaluation of factoids answers• Complex Questions

41

Focused Summarization and QA

• Most interesting/important questions are not factoids– In children with acute febrile illness, what is the efficacy of

single-medication therapy with acetaminophen or ibuprofen in reducing fever?

– Where have poacher endangered wildlife, what wildlife has been endangered and what steps have been taken to prevent poaching?

• Factoids may be found in a single document, more complex questions may require analysis and synthesis from multiple sources– Summarization techniques: (query) focused

summarization– Information extraction techniques

42

Structure of a complex QA system

Corpusor

Web

Question

Query

AnswerType

Query Processing

Query formulation

Query classification

Data-Driven analysis

IR

Predicate identification

Three stages:

1. Question processing2. Predicate identification3. Data driven analysis4. Definition creation

Definition creation

Answer

What are some promising

untried treatments for

Raynaud’s disease?

43

Stages of complex QA system

• Predicate identification – Information extraction (identification of the appropriate semantic entities

(DISEASE)

• Data driven analysis– Summarization, co-reference, inference, avoid

redundancy… All the difficult stuff!• Definition creation

– If domain specific, can have templates for information ordering:

– E.g., for biography questions, may use a template such as:

– <NAME> is <WHY FAMOUS>. She was born in <BIRTHDATE>. She <EDUCATION>. <DESCRIPTIVE SENTENCE>.

Documents

1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario