Is Question Answering an Acquired Skill?

Is Question Answering

an Acquired Skill?Soumen Chakrabarti

IIT Bombay

WithGanesh

RamakrishnanDeepa Paranjpe

Vijay KrishnanArnab Nandi

QA Chakrabarti

Web search and QA Information need – words relating “things”

+ “thing” aliases = telegraphic Web queries• Cheapest laptop with wireless

best price laptop 802.11• Why is the sky blue? sky blue reason• When was the Space Needle built?

“Space Needle” history Entity + relation extraction technology

better than ever (SemTag, KnowItAll, Biotext)• Ontology extension (e.g., is a kind of)• List extraction (e.g., is an instance of)• Slot-filling (author X wrote book Y)

QA Chakrabarti

Factoid QA Specialize given domain to a token related

to ground constants in the query• What animal is Winnie the Pooh?

• hyponym(“animal”) NEAR “Winnie the Pooh”

• When was television invented?• instance-of(“time”) NEAR “television” NEAR

synonym(“invented”)

FIND x “NEAR” GroundConstants(question) WHERE x IS-A Atype(question)• Ground constants: Winnie the Pooh, television• Atypes: animal, time

QA Chakrabarti

A relational view of QA

Entity class or atype may be expressed by• A finite IS-A hierarchy (e.g. WordNet, TAP)• A surface pattern matching infinitely many strings

(e.g. “digit+”, “Xx+”, “preceded by a preposition”)

Match selectors, specialize atype to answer tokens

Question Atypeclues Selectors

Answerpassage

Questionwords

“Answer zone”

DirectsyntacticmatchEntity class

IS-ALimit searchto certain rows

Locate whichcolumn to read

“Answer zone”

Attributeor column

name

QA Chakrabarti

Benefits of the relational view “Scaling up by dumbing down”

• Next stop after vector-space• Far short of real knowledge representation

and inference• Barely getting practical at (near) Web scale

Can set up as a learning problem: train with questions (query logs) and answers in context

Transparent, self-tuning, easy to deploy• Feature extractors used in entity taggers• Relational/graphical learning on features

QA Chakrabarti

What TREC QA feels like How to assemble chunker, parser, POS and NE

tagger, WordNet, WSD, … into a QA system? Experts get much insight from old QA pairs

• Matching an upper-cased term adds a 60% bonus … for multi-words terms and 30% for single words

• Matching a WordNet synonym … discounts by 10% (lower case) and 50% (upper case)

• Lower-case term matches after Porter stemming are discounted 30%; upper-case matches 70%

QA Chakrabarti

Talk outline Relational interpretation of QA Motivation for a “clean-room” IE+ML

system Learning to map between questions and

answers using is-a hierarchies and IE-style surface patterns• Can handle prominent finite set of atypes:

person, place, time, measurements,… Extending to arbitrary atype

specializations• Required for what… and which… questions

Ongoing work and concluding remarks

QA Chakrabarti

Feature + Soft match FIND x “NEAR” GroundConstants(question)

WHERE x IS-A Atype(question) No fixed question or answer type system Convert “x IS-A Atype(question)” to a soft

match “DoesAtypeMatch(x, question)

Question Answer tokensPassage

IE-style surfacefeature extractors

WordNet hypernymfeature extractors

IE-style surfacefeature extractors

Question feature vector

Snippet feature vector

Learn joint distrib.

QA Chakrabarti

Feature extraction: Intuitionhow who

fast manyfar rich wrote first

How fast can a cheetah run?

A cheetah can chase its preyat up to 90 km/h

How fast does light travel?

Nothing moves faster than186,000 miles per hour, thespeed of light

rate#n#2

abstraction#n#6NNS

rate

#n#

2m

agnit

ude_r

ela

tion#

n#

1

mile

#n#

3lin

ear_

unit

#n#

1

measu

re#

n#

3definit

e_q

uanti

ty#

n#

1

paper_

money#

n#

1cu

rrency

#n#

1

writer, composer,artist, musician

NNP, person

explorer

QA Chakrabarti

Feature extractors Question features: 1, 2, 3-token

sequences starting with standard wh-words

Passage surface features: hasCap, hasXx, isAbbrev, hasDigit, isAllDigit, lpos, rpos,…

Passage WordNet features: all noun hypernym ancestors of all senses of token

Get top 300 passages from IR engine For each token invoke feature extractors Label = 1 if token is in answer span, 0 o/w Question vector xq, passage vector xp

QA Chakrabarti

Preliminary likelihood ratio tests

Surface patterns WordNet hypernyms

QA Chakrabarti

A simple, flat conditional model Let x = xq xp (pairwise product of elems) Model Pr(Y=1|x) = exp(wx)/(1+exp(wx)) For every question-feature, passage-

feature pair, w has a parameter Expect to perform

better than “linear”model x=(xp,xq)

Can discount for redundancy in pair info If xq (xp) is fixed, what xp (xq) will yield the

largest Pr(Y=1|x)? (linear iceberg query)

how_far

when

what_city

region#n#3

entity#n#1

QA Chakrabarti

Classification accuracy

0

0.2

0.4

0.6

0.8

1

0 0.05 0.1 0.15 0.2False positiveT

rue

posi

tive Linear

Quadratic0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.8 1RecallP

reci

sion

Linear

Quadratic

Pairing more accurate than linear model Steep learning curve; linear never “gets it” beyond

“prior” atypes like proper nouns (common in TREC) Are the estimated w parameters meaningful?

QA Chakrabarti

Parameter anecdotes Surface and

WordNet features complement each other

General concepts get negative params: use in predictive annotation

Learning is symmetric (QA)

QA Chakrabarti

Query-driven information extraction

“Basis” of atypes A, a A could be a synset, a surface pattern, feature of a parse tree

Question q “projected” to vector (wa: a A) in atype space via learning conditional model

E.g. if q is “when…” or “how long…” whasDigit and wtime_period#n#1 are large, wregion#n#1 is small

Each corpus token t has associated indicator features a(t ) for every a

E.g. hasDigit(3,000) = is-a(region#n#1)(Japan) = 1 Can also learn [0,1] value of is-a proximity

QA Chakrabarti

Single token scoring A token t is a candidate answer if

Hq(t ): Reward tokens appearing “near” selectors matched from question• 0/1: appears within fixed window with selector/s• Activation in linear token sequence model• Proximity in chunk sequences, parse trees,…

Order tokens by decreasing

0)()( Aa

aa qwtAtype indicator features of the token

Projection of questionto “atype space”

…the armadillo, found in Texas, is covered with strong horny plates

Aa

aaq qwttH )()()(

QA Chakrabarti

Mean reciprocal rank (MRR) nq = smallest rank among answer

passages MRR = (1/|Q|) qQ(1/nq)

• Dropping passage from #1 to #2 as bad as dropping it from #2 to

TREC requires MRR5: round up nq>5 to • Improving rank from 20 to 6 as useless as

improving it from 20 to 15 Aggregate score influenced by many

complex subsystems• Complete description rarely available

QA Chakrabarti

Effect of eliminating non-answers

300 top IR score hits If Pr(Y=1|token) <

threshold reject token All tokens rejected then

reject passage Present survivors in IR

order

0

100

200

300

0 100 200 300IR rank

Ran

k af

ter f

ilter

ing

TREC 2000TREC 2002

TREC 20000.491

0.336

0.3

0.4

0.5

0 0.5Acceptance threshold

MR

R

MRRMRR5Baseline

TREC 20020.334

0.224

0.2

0.25

0.3

0.35

0 0.5Acceptance threshold

MR

R

MRRMRR5Baseline

QA Chakrabarti

Drill-down and ablation studies Scale average MRR

improvement to 1• What, Which <

average• Who average

Atype of what… and which… not captured well by 3-grams starting at wh-words

Atype ranges over essentially infiniteset with relativelylittle training data

TREC 2002

0.8

0.9

1

1.1

1.2

wha

t

whi

ch

nam

e

whe

re

how

whe

n

whoQuestion

type-->

Rel

ativ

e M

RR

ga

in

QA Chakrabarti

Talk outline Relational interpretation of QA Motivation for a “clean-room” IE+ML

system Learning to map between questions and

answers using is-a hierarchies and IE-style surface patterns• Can handle prominent finite set of atypes:

person, place, time, measurements,… Extending to arbitrary atype

specializations• Required for what… and which… questions

Ongoing work and concluding remarks

QA Chakrabarti

What…, which…, name… atype clues

Assumption: Question sentence has a wh-word and a main/auxiliary verb

Observation: Atype clues are embedded in a noun phrase (NP) adjoining the main or auxiliary verb

Heuristic: Atype clue = head of this NP• Use a shallow parser and apply rule

Head can have attributes• Which (American (general)) is buried in

Salzburg?• Name (Saturn’s (largest (moon)))

QA Chakrabarti

Atype clue extraction statsQuestion

type#Questions

#Extracted correctly

what 630 612which 29 28name 23 20

Simple heuristic quite effective If successful, extracted atype is mapped to

WordNet synset (mooncelestial body etc.) If no atype of this form available, try the “self-

evident” atypes (who, when, where, how_X etc.)

New boolean feature for candidate token: is token hyponym of atype synset?

QA Chakrabarti

The last piece: Learning selectors

Which question words are likely to appear (almost) unchanged in an answer passage?• Constants in select-clauses of SQL queries• Guides backoff policy for keyword query

Local and global features• POS of word, POS of adjacent words, case info,

proximity to wh-word• Suppose word is associated with synset set S

• NumSense: size of S (how polysemous is the word?)

• NumLemma: average #lemmas describing s S

POS@0 POS@1POS@-1

QA Chakrabarti

Selector results Global features (IDF, NumSense, NumLemma)

essential for accuracy• Best F1 accuracy with local features alone: 71—73%• With local and global features: 81%

Decision trees better than logistic regression• F1=81% as against LR F1=75%• Intuitive decision branches• But logistic regression gives scores for query

backoff

N um Lem m a@ 0<=2.5 N um Lem m a@ 0>2.5

N um Sense@ 0<=9 N um Sense@ 0>9

PO S@ -1=N oun ...

PO S@ 0=Adj

PO S@ -1=N oun

N um Lem m a@ 0<=1.82 N um Lem m a@ 0>1.82

PO S@ 0=Verb

QA Chakrabarti

Putting together a QA system

QASystem

Wordnet

POSTagger

TrainingCorpus

Shallow p

arser

Learn

ing

tools

N-E

Tag

ger

QA Chakrabarti

Question

PassageIndex

Corpus

Sentence splitterPassage indexer

Candidatepassage

Keyword query

Keyword querygenerator

ShallowParser

Noun andverb markers

AtypeExtractor

Atype clues

Learning to rerank passagesSample features:•Do selectors match? How many?•Is some non-selector passage token a specialization of the question’s atype clue?•Min, avg, linear token distance between candidate token and matched selectors

Learning to rerank passagesSample features:•Do selectors match? How many?•Is some non-selector passage token a specialization of the question’s atype clue?•Min, avg, linear token distance between candidate token and matched selectors

LogisticRegression

Rerankedpassages

Putting together a QA systemTokenizer

POS TaggerTaggedquestion

TokenizerPOS Tagger

Entity Extractor

Taggedpassage

SelectorLearner

Is QA pair?

QA Chakrabarti

Learning to re-rank passages Remove passage tokens matching

selectors• User already knows these are in passage

Find passage token/s specializing atype

For each candidate token collect• Atype of question, original rank of passage• Min, avg linear distances to matched

selectors• POS and entity tag of token if available

Ushuaia, a port of about 30,000 dwellers set between the Beagle Channel and …

How many inhabitants live in the town of Ushuaia

selector matchSurface pattern hasDigits

WordNet match

5 tokens apart 1

QA Chakrabarti

Re-ranking results Categorical and

numeric attributes Logistic regression Good precision,

poor recall Use logit score to

re-rank passages Rank of first correct

passage shifts substantially

194479

1

10

100

1000

1 2 3 4 5 6 7 8 9 10Answer at rank

Fre

quen

cy

BaselineRerank

QA Chakrabarti

MRR gains from what, which, name

Substantial gain in MRR

What/which now show above-average MRR gains

TREC 2000 top MRRs:0.76 0.71 0.46 0.46 0.31

Ranking strategy TREC 2000 TREC 2002IR score (Lucene) 0.377 0.249Conditional model 0.491 0.334Atype for what/which/name 0.71 0.565

00.10.20.30.40.50.60.70.8

whe

n

wha

t

whe

re

how

whi

ch

how

man

y

how

muc

h

Question type

MR

R

Pre-reranking

Post-reranking

QA Chakrabarti

Generalization across corpora

Across-year numbers close to train/test split on a single year

Features and model seem to capture corpus-independent linguistic Q+A artifacts

QA Chakrabarti

Conclusion Clean-room QA= feature

extraction+learning• Recover structure info from question• Learn correlations between question structure

and passage features Competitive accuracy with negligible

domain expertise or manual intervention Ongoing work

• Model how selector and atype are related• Model coefficients to predictive annotation• Combine token scores to better passage scores• Treat all question types uniformly• Use redundancy available from the Web

Documents

Is Question Answering an Acquired Skill?