1 SIMS 290-2: Applied Natural Language Processing Marti Hearst November 17, 2004

1

SIMS 290-2: Applied Natural Language Processing

Marti HearstNovember 17, 2004

2

Today

Using WordNet in QAOther Resources in QASemantic Reasoning in QADefinition QuestionsComplex Questions

3Adapted from slide by Shauna Eggers

Question Answering

Captures the semantics of the question by recognizingexpected answer type (i.e., its semantic category)relationship between the answer type and the question concepts/keywords

The Q/A process:Question processing – Extract concepts/keywords from question

Passage retrieval – Identify passages of text relevant to query

Answer extraction – Extract answer words from passage

Relies on standard IR and IE TechniquesProximity-based features

– Answer often occurs in text near to question keywords

Named-entity Recognizers– Categorize proper names into semantic types (persons, locations,

organizations, etc)– Map semantic types to question types (“How long”, “Who”, “What

company”)

4Adapted from slide by Harabagiu and Narayanan

The results of the past 5 TREC evaluations of QA systems indicate that current state-of-the-art QA is determined by the recognition of Named EntitiesIn TREC 2003 the LCC QA system extracted 289 correct answers for factoid questionsThe Name Entity Recognizer was responsible for 234 of them

QUANTITY 55 ORGANIZATION 15 PRICE 3

NUMBER 45 AUTHORED WORK 11 SCIENCE NAME 2

DATE 35 PRODUCT 11 ACRONYM 1

PERSON 31 CONTINENT 5 ADDRESS 1

COUNTRY 21 PROVINCE 5 ALPHABET 1

OTHER LOCATIONS 19 QUOTE 5 URI 1

CITY 19 UNIVERSITY 3

The Importance of NER


The Special Case of Names

1934: What is the play “West Side Story” based on?Answer: Romeo and Juliet

1976: What is the motto for the Boy Scouts?Answer: Be prepared.

1982: What movie won the Academy Award for best picture in 1989?Answer: Driving Miss Daisy

2080: What peace treaty ended WWI?Answer: Versailles

2102: What American landmark stands on Liberty Island?Answer: Statue of Liberty

Questions asking for names of authored works


Problems

NE assumes all answers are named entitiesOversimplifies the generative power of language!What about: “What kind of flowers did Van Gogh paint?”

Does not account well for morphological, lexical, and semantic alternations

Question terms may not exactly match answer terms; connections between alternations of Q and A terms often not documented in flat dictionaryExample: “When was Berlin’s Brandenburger Tor erected?” no guarantee to match builtRecall suffers


LCC Approach:WordNet to the rescue!WordNet can be used to inform all three steps of the Q/A process1. Answer-type recognition (Answer Type Taxonomy)2. Passage Retrieval (“specificity” constraints)3. Answer extraction (recognition of keyword alternations)

Using WN’s lexico-semantic info: Examples“What kind of flowers did Van Gogh paint?”

– Answer-type recognition: need to know (a) answer is a kind of flower, and (b) sense of the word flower

– WordNet encodes 470 hyponyms of flower sense #1, flowers as plants

– Nouns from retrieved passages can be searched against these hyponyms

“When was Berlin’s Brandenburger Tor erected?”– Semantic alternation: erect is a hyponym of sense #1 of

build

8

WN for Answer Type RecognitionEncodes 8707 English concepts to help recognize expected answer typeMapping to parts of Wordnet done by hand

Can connect to Noun, Adj, and/or Verb subhierarchies


WN in Passage Retrieval

Identify relevant passages from textExtract keywords from the question, andPass them to the retrieval module

“Specificity” – filtering question concepts/keywords

Focuses search, improves performance and precisionQuestion keywords can be omitted from the search if they are too generalSpecificity calculated by counting the hyponyms of a given keyword in WordNet

– Count ignores proper names and same-headed concepts– Keyword is thrown out if count is above a given threshold

(currently 10)


WN in Answer ExtractionIf keywords alone cannot find an acceptable answer, look for alternations in WordNet!

Q196: Who wrote “Hamlet”?

Morphological Alternation: wrote written

Answer: before the young playwright has written Hamlet – and Shakespeare seizes the opportunity

Q136: Who is the queen of Holland?

Lexical Alternation: Holland Netherlands

Answer: Pricess Margrit, sister of Queen Beatrix of the Netherlands, was also present

Q196: What is the highest mountain in the world?

Semantic Alternation: mountain peak

Answer: first African country to send an expedition to Mount Everest, the world’s highest peak


EvaluationPaşca/Harabagiu (NAACL’01 Workshop) measured approach using TREC-8 and TREC-9 test collectionsWN contributions to Answer Type Recognition

Count number of questions for which acceptable answers were found; 3GB text collection, 893 questions

All What only

Flat dictionary (baseline) 227 (32%) 48 (13%)

A-type taxonomy (static) 445 (64%) 179 (50%)

A-type taxonomy (dynamic) 463 (67%) 196 (56%)

A-type taxonomy (dynamic + answer patterns) 533 (76%) 232 (65%)

Method # questions with correct answer type


Evaluation

WN contributions to Passage RetrievalImpact of keyword alternations

Impact of specificity knowledge

No alternations enabled 55.3% precision

Lexical alternations enabled 67.6%

Lexical + semantic alternations enabled 73.7%

Morphological expansions enabled 76.5%

TREC-8 TREC-9

Not included 133 (65%) 463 (67%)

Included 151 (76%) 515 (74%)

Specificity knowledge

# questions with correct answer infirst 5 documents returned

13

Going Beyond Word Matching

Use techniques from artificial intelligence to try to draw inferences from the meanings of the wordsThis is a highly unusual and ambitious approach.

Surprising it works at all!Requires huge amounts of hand-coded information

Uses notions of proofs and inference from logicAll birds fly. Robins are birds. Thus, robins fly.forall(X): bird(X) -> fly(x)forall(X,Y): student(X), enrolled(X,Y) -> school(Y)

14Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Inference via a Logic Prover

The LCC system attempts inference to justify an answerIts inference engine is a kind of funny middle ground between logic and pattern matchingBut quite effective: 30% improvement

Q: When was the internal combustion engine invented?A: The first internal-combustion engine was built in 1867.

invent -> create_mentally -> create -> build

15Adapted from slide by Surdeanu and Pasca

COGEXWorld knowledge from:

WordNet glosses converted to logic forms in the eXtended WordNet (XWN) project Lexical chains

– game:n#3 HYPERNYM recreation:n#1 HYPONYM sport:n#1

– Argentine:a#1 GLOSS Argentina:n#1

NLP axioms to handle complex NPs, coordinations, appositions, equivalence classes for prepositions etceteraNamed-entity recognizer

– John Galt HUMAN

A relaxation mechanism is used to iteratively uncouple predicates, remove terms from LFs. The proofs are penalized based on the amount of relaxation involved.

16Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Logic Inference Example

“How hot does the inside of an active volcano get?” get(TEMPERATURE, inside(volcano(active))) “lava fragments belched out of the mountain were as hot as 300 degrees Fahrenheit” fragments(lava, TEMPERATURE(degrees(300)), belched(out, mountain)) volcano ISA mountain lava ISPARTOF volcano -> lava inside volcano fragments of lava HAVEPROPERTIESOF lava

The needed semantic information is in WordNet definitions, and was successfully translated into a form that was used for rough ‘proofs’


Axiom Creation

XWN AxiomsA major source of world knowledge is a general purpose knowledge base of more than 50,000 parsed and disambiguated WordNet glosses that are transformed into logical form for use during the course of a proof.

Gloss: Kill is to cause to die

Logical Form: kill_VB_1(e1,x1,x2) -> cause_VB_1(e1,x1,x3) & to_TO(e1,e2) & die_VB_1(e2,x2,x4)


Lexical ChainsLexical Chains

Lexical chains provide an improved source of world knowledge by supplying the Logic Prover with much needed axioms to link question keywords with answer concepts.

Question: How were biological agents acquired by bin Laden?

Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in

Lexical Chain: ( v - buy#1, purchase#1 ) HYPERNYM ( v - get#1, acquire#1 )


Axiom Selection Lexical chains and the XWN knowledge base work together to select and

generate the axioms needed for a successful proof when all the keywords in the questions are not found in the answer.

Question: How did Adolf Hitler die?

Answer: … Adolf Hitler committed suicide …

The following Lexical Chain is detected:( n - suicide#1, self-destruction#1, self-annihilation#1 ) GLOSS ( v - kill#1 ) GLOSS ( v - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 ) 2

The following axioms are loaded into the Prover: exists x2 all e1 x1 (suicide_nn(x1) -> act_nn(x1) & of_in(x1,e1) &

kill_vb(e1,x2,x2)).

exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2) -> cause_vb_2(e1,x1,x3) & to_to(e1,e2) & die_vb(e2,x2,x4)).

20

LCC System Refecences

The previous set of slides drew information from these sources:

The Informative Role of WordNet in Open-Domain Question Answering, Pasca and Harabagiu, WordNet and Other Lexical Resources, NAACL 2001 WorkshopPasca and Harabagiu, High Performance Question/Answering, SIGIR’01Moldovan, Clark, Harabagiu, Maiorano: COGEX: A Logic Prover for Question Answering. HLT-NAACL 2003Moldovan, Pasca, Harabagiu, and Surdeanu: Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2): 133-154 (2003) Harabagiu and Maiorano, Abductive Processes for Answer Justification, AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002

21

Incorporating Resources

For 29% of TREC questions the LCC QA system relied on an off-line taxonomy with semantic classes such as:

DiseaseDrugsColorsInsectsGames

22

Incorporating Resources

How well can we do just using existing resources?Method:

Used the 2,393 TREC questions and answer keysDetermined if it would be possible for an algorithm to locate the target answer from the resource.So a kind of upper bound.

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

23

Gazetteers

Resources:CIA World Factbookwww.astronomy.comwww.50states.com

Method:“Can be directly answered”(Not explained further)

Results:High precision, low recall


24

WordNetResource:

WordNet glosses, synonyms, hypernyms, hyponyms

Method:Question terms and phrases extracted and looked up.If answer key matched any of these WordNet resources, then considered found.Thus, measuring an upper bound.

Results:About 27% can in principle be answered from WN alone


25

Definition Resources

Resources:WikipediaGoogle’s define operator

Method:Formulate a query from n-grams extracted from each question.

Results:Encyclopedia most helpfulTREC-12 had fewer define q’s so less benefit.


26

Web Pages

Resource:The Web

Method:Questions tokenized and stopwords removed.Keywords “used” (no further details) to retrieve 100 docs via Google API.

Results:A relevant doc is found somewhere in the results for nearly 50% of the q’s.


27

Web N-grams

Resource:The Web

Method:Retrieved top 5, 10, 15, 20, 50, and 100 docs via web query (no details provided) via Google APIExtracted the most frequent 50 n-grams (up to trigrams)(not clear if using full text or summaries only)

Results:The correct answer is found in the top 50 n-grams more than 50% of the time.


28

Using Machine Learning in QA

The following slides are based on:Ramakrishnan, Chakrabarti, Paranjpe, Bhattacharyya, Is Question Answering an Acquired Skill? WWW’04

29

Learning Answer Type Mapping

Idea: use machine learning techniques to automatically determine answer types and query terms from questions.Two types of answer types:

Surface patterns– Infinite set, so can’t be covered by a lexicon– DATES NUMBERS PERSON NAMES LOCATIONS– “at DD:DD” “in the ‘DDs” “in DDDD” “Xx+ said”– Can also associate with synset [date#n#7]

WordNet synsets– Consider: “name an animal that sleeps upright”– Answer: “horse”

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

30Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Determining Answer Types

The hard ones are “what” and “which” questions.Two useful heuristics:

If the head of the NP appearing before the auxiliary or main verb is not a wh-word, mark this as an a-type clueOtherwise, the head of the NP appearing after the auxiliary/main verb is an atype clue.

31

Learning Answer Types

Given a QA pair (q, a):– (“name an animal that sleeps upright”, “horse”)

(1a) See which atype(s) “horse” can map to(1b) Look up the hypernyms of “horse” -> S(2a) Record the k words to the right of the q-word(2b) For each of these k words, look up their synsets

– An, animal, that(2c) Increment the counts for those synsets that also appear in S

Do significance testingCompare synset frequencies against a background setRetain only those that are significantly associated with the question word more so than in general (chi-square)


32

Learning Answer Tyeps


33

Learning to Choose Query Terms

Which words from the question to use in the query?A tradeoff between precision and recall.Example:

“Tokyo is the capital of which country?”Want to use “Tokyo” verbatimProbably “capital” as wellBut maybe not “country”; maybe “nation” or maybe this word won’t appear in the retrieved passage at all.

– Also, “country” corresponds to the answer type, so probably we don’t want to require it to be in the answer text.


34

Learning to Choose Query Terms

Features:POS assigned to word and immediate neighborsStarts with uppercase letterIs a stopwordIDF scoreIs an answer-type for this questionAmbiguity indicators:

– # of possible WordNet senses (NumSense)– # of other WordNet synsets that describe this sense

E.g., for “buck”: stag, deer, doe (NumLemma)

Learner:J48 decision tree worked best


35

Learning to Choose Query TermsResults

WordNet ambiguity indicators were very helpful– Raised accuracy from 71-73% to 80%

Atype flag improved accuracy from 1-3%

36

Learning to Score Passages

Given a question, and answer, & a passage (q, a, r)Assign +1 if r contains aAssign –1 otherwise

Features:Do selected terms s from q appear in r?Does r have an answer zone a that does not s?Are the distances between tokens in a and s small?Does a have a strong WordNet similarity with q’s answer type?

Learner:Use logistic regression, since it produces a ranking rather than a hard classification into +1 or –1

– Produces a continuous estimate between 0 and 1


37


Results:F-scores are low (.33 - .56)However, reranking greatly improves the rank of the corresponding passages.Eliminates many non-answers, pushing better passages towards the top.

38Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04


39

Computing WordNet Similarity

Path-based similarity measures are not all that good in WordNet

3 hops from entity to artifact3 hops from mammal to elephant

An alternative: Given a target synset t and an answer synset a:

– Measure the overlap of nodes on the path from t to all noun roots and from a to all noun roots

– Algorithm for computing similarity of t to a: If t is not a hypernym of a: assign 0 Else collect the set of hypernym synsets of t and a Call them Ht and Ha Compute the Jaccard overlap

» |Ht Intersect Ha| / |Ht Union Ha|


40

Computing WordNet SimilarityAlgorithm for computing similarity of t to a:

– |Ht Intersect Ha| / |Ht Union Ha|

chordate

vertebrate

mammal

elephant

placental mammalproboscidean

animal

living thing

organism

object

entity|Ht Intersect Ha|

|Ht Union Ha|

Ht = mammal, Ha = elephant7/10 = .7

Ht = animal, Ha = elephant5/10 = .5

Ht = animal, Ha = mammal4/7 = .57

Ht = mammal, Ha = fox7/11 = .63



System Extension: Definition Questions

Definition questions ask about the definition or description of a concept:

Who is John Galt?What is anorexia nervosa?

Many “information nuggets” are acceptable answersWho is George W. Bush?

– … George W. Bush, the 43rd President of the United States… – George W. Bush defeated Democratic incumbent

Ann Richards to become the 46th Governor of the State of Texas…

ScoringAny information nugget is acceptablePrecision score over all information nuggets


Definition Detection with Pattern Matching

What <be> a <QP> ?Who <be> <QP> ? example: “Who is Zebulon Pike?”<QP>, the <AP><QP> (a <AP>)<AP HumanConcept> <QP> example: “explorer Zebulon Pike”

Question patterns

Answer patterns

Q386: What is anorexia nervosa?

cause of anorexia nervosa, an eating disorder...

Q358: What is a meerkat? the meerkat, a type of mongoose, thrives in...

Q340: Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...


Answer Detection with Concept Expansion

Enhancement for Definition questionsIdentify terms that are semantically related to the phrase to defineUse WordNet hypernyms (more general concepts)

Question WordNet hypernym Detected answer candidate

What is a shaman? {priest, non-Christian priest}

Mathews is the priest or shaman

What is a nematode?

{worm} nematodes, tiny worms in soil

What is anise? {herb, herbaceous plant}

anise, rhubarb and other herbs

44Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

Online QA Examples

Examples (none work very well)AnswerBus:

– http://www.answerbus.com

Ionaut: – http://www.ionaut.com:8400/

LCC: – http://www.languagecomputer.com/demos/question

_answering/index.html

45

What about Complex Questions?

Documents

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst November 17, 2004