35
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University of Singapore Email: [email protected]

QUALIFIER in TREC-12 QA Main Task

  • Upload
    max

  • View
    44

  • Download
    2

Embed Size (px)

DESCRIPTION

QUALIFIER in TREC-12 QA Main Task. Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University of Singapore Email: [email protected]. Outline. Introduction Factoid Subsystem List Subsystem Definition Subsystem Result - PowerPoint PPT Presentation

Citation preview

Page 1: QUALIFIER in TREC-12 QA Main Task

QUALIFIER in TREC-12 QA Main Task

Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of ComputingNational University of SingaporeEmail: [email protected]

Page 2: QUALIFIER in TREC-12 QA Main Task

Outline

IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work

Page 3: QUALIFIER in TREC-12 QA Main Task

Introduction

Given a question and a large text corpus, return an “answer” rather than relevant “documents”

QA is at the intersection of IR + IE + NLP Our system - QUALIFIER

Consists 3 subsystems External Resources – Web, WordNet, Ontology Event-based Question Answering New Modules introduced

Page 4: QUALIFIER in TREC-12 QA Main Task

Outline

IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work

Page 5: QUALIFIER in TREC-12 QA Main Task

Factoid System Overview

Q u es t io n(d efin i t io n ,

facto id ,l i s t )

o r igin a l que r yt e r m s

O n to lo g y

A n s w er

Q u es tio n An aly s is

Q ue st io n C la ssif ic a t io n

Q u er y P ar s in g

q c las s

Q A E v en t An aly s is

W eb P re-ret ri ev alD o cu m en ts

S n ip p e ts

W o r d N et

S t ru ctu red Q u ery

D o c u m en t R e tr iev a l

T R E CC o r p u s

An s w er E x tr ac t io n

N am ed E n tity

A n s w er J u s t ificat io n

c an o n ic a liza t io nr es o lu tio n

P as s ag e R etr iev a l

As s o c ia t io n R u lesN S en ten ce W in d o w

An ap h o r aR es o lu tio n

D o c u m en ts

R efin ed D o cu m en ts

An s w er S e lec tio n

S C R

Page 6: QUALIFIER in TREC-12 QA Main Task

Factoid Subsystem

Detailed Question Analysis QA Event Construction QA Event Mining Answer Selection Answer Justification Fine-grained Named Entity Recognition Anaphora Resolution Canonicalization Coreference Successive Constraint Relaxation

Page 7: QUALIFIER in TREC-12 QA Main Task

Factoid Subsystem

Detailed Question Analysis QA Event Construction QA Event Mining Answer Selection Answer Justification Fine-grained Named Entity Recognition Anaphora Resolution Canonicalization Coreference Successive Constraint Relaxation

Page 8: QUALIFIER in TREC-12 QA Main Task

Why Event-based QA - I

The world consists of two basic types of things: entities and events and people often ask questions about them.

From Question Answering’s Point of View Questions = “enquiries about entities or

events”.

Page 9: QUALIFIER in TREC-12 QA Main Task

Why Event-based QA - II

QA Entities “Anything having existence (living or

nonliving)” E.g. “What is the democratic party

symbol?”

QA Events “Something that happens at a given place

and time”. E.g. “How did donkey become

democratic party symbol?”

Thomas Nast

1870Harper’s Weekly cartoon

Page 10: QUALIFIER in TREC-12 QA Main Task

Why Event-based QA - III Entity Questions

Properties, or entities themselves

definition questions. Event Questions

Elements of events Location, Time, Subject, Object, Quantity Description Action, etc.

WH-Question QA Event Elements

Who Subject, Object

Where Location

When Time

WhatSubject, Object, Description, Action

Which Subject, Object,

How Quantity, Description

Table 1: Correspondence of WH-Questions & Event Elements

question :== event | event_element | entity | entity_property event :== { event_element }event_element :== time | location | subject | object | quantity | description | action |

otherentity :== object | subjectentity_property :== quantity | description | other

Page 11: QUALIFIER in TREC-12 QA Main Task

Event-based QA Hypothesis

Equivalency: QA event Ei,Ej ,if all_elements(Ei) =

all_elements(Ej), then Ei = Ej, and vice versa;

Generality: if all_elements(Ei) is a subset of

all_elements(Ej), then Ei is more general than Ej;

Cohesiveness: if elements a, b both belong to an event Ei, and a, c do not belong to a known event,

then co-occurrence(a,b) is greater than co-occurrence(a,c);

Predictability: if elements a, b both belong to an event Ei, then a => b and b => a.

Page 12: QUALIFIER in TREC-12 QA Main Task

QA Event Space

Consider an event to be a point in a multi-dimensional QA event space.

If we know all the elements about an event, then we can easily answer different questions about it E.g. “When did Bob Marley die ?”

As there are innate associations among these elements if they belong to the same event (Cohesiveness), we can use what are already known To narrow the search scope To find rest of the unknown event elements, the answer (Predictability)

Page 13: QUALIFIER in TREC-12 QA Main Task

Problems to be Solved

However, for most of the cases, it is difficult to find the correct unknown element(s), i.e., the correct answer

Two major problems: Insufficient known elements Inexact known elements

Solution: Explore the use of world knowledge (Web and WordNet glosses) to find more known elements Exploit the lexical knowledge from (WordNet synsets and morphemics) to find exact forms.

Page 14: QUALIFIER in TREC-12 QA Main Task

How to Find a QA Event

Using Web From original query term q(0) , retrieve top N web documents qi

(0)q(0), extract nearby non-trivial words in one sentence or n words away (in Cq ) and rank them by computing its probability of correlation with qi

(0)

Using WordNet qi

(0)q(0), extract terms that are lexically related to q i(0) by

locating them in Gloss Gq and Synset Sq Combine the external knowledge resources to form term

collection:Kq = Cq + (Gq Sq)

)(

)()( )0(

)0(

iiks

iiksik

qtd

qtdtweight

Page 15: QUALIFIER in TREC-12 QA Main Task

QA Event Construction

Structured Query FormulationWe perform structural analysis on Kq to

form semantic groups of terms

Given any two distinct terms ti, tj Kq , we

compute their Lexical correlation Co-occurrence correlation Distance correlation

Page 16: QUALIFIER in TREC-12 QA Main Task

QA Event Construction

For example, “What Spanish explorer discovered the Mississippi River?”

Eve n t El e m e n t s

Eve n t

" M ississip p i"

fi rs triv er

E u ro p ean

1 5 4 1 H ern an d oS o to D e

M ississip p i

S p an ishF ren ch

The final Boolean query becomes: “(Mississippi) & (French|Spanish) & (Hernando & Soto & De) & (1541) & (explorer) & (first | European |river)”.

Page 17: QUALIFIER in TREC-12 QA Main Task

QA Event Mining

Extract important association rules among the elements by using data mining techniques.

Given a QA event Ei, we define X, Y as two sets of event elements.

Event mining studies the rules of the form X Y, where X, Y are QA event element sets, X Y =, and Y {elementoriginal }=. if X Y , ignore X Y. if cardinality(Y) > 1, ignore X Y. if Y {elementoriginal }, ignore X Y.

Page 18: QUALIFIER in TREC-12 QA Main Task

Passage & Answer Selection

Select Passage based on Answer Event Score (AES) from the relevant documents in the QA corpus:

Support (X Y) = Confidence (X Y) = The weight for answers candidate j is

defined as:

ele

N

iiiiele

N

ruleConfidenceruleSupportMMPAES

r

1

)))()((*()(

XXYXd

andedoriginalwindow

wexp

)(

)(

)(

Xd

YXd

w

w

jY

iij ruleSupportPAESjweight

1)()()(

Page 19: QUALIFIER in TREC-12 QA Main Task

Related Modules: Fine-grained Named Entity Recognition

Fine-grained NE Tagging Non-ascii Character Remover Number Format Converter

E.g. “one hundred eleven” => 111 Rule Confliction Revolver

Longer Length Ontology Handcrafted Priorities

HUMAN: Basic, Organization, Person

TIME: Basic, Day, Month, Year

LOCATI ON: Basic, Body, City, Continent, Country, County, I sland, Lake, Mountain, Ocean, Planet, Province, River, Town

NUMBER: Basic, Age, Area, Count, Degree, Distance, Frequency, Money, Percent, Period, Range, Size, Speed

CODE URL, Telephone, Post code, Email address, Product index

OBJ ECT:

Basic, Animal, Breed, Color, Currency, Entertainment, Game, Language, Music, Plant, Profession, Religion, War, Works

Page 20: QUALIFIER in TREC-12 QA Main Task

Related Modules: Answer Justification

We generate axioms based on our manually constructed ontology. For example, q1425: What is the population of Maryland? Sentence: “Maryland 's population is 50,000 and

growing rapidly.” Ontology Axiom (OA): Maryland (c1) & population

(c1, c2) -> 5000000(c2)

In this way, we could identify the wrong answer “50000”, which is the surface text shown.

Page 21: QUALIFIER in TREC-12 QA Main Task

Factoid Results

1 Focus on answer coverage

w anaphora resolution, more successive constraint relaxation loops

2 Focus on answer precision

w/o anaphora resolution, less successive constraint relaxation loops

Page 22: QUALIFIER in TREC-12 QA Main Task

Factoid Results

# correct 232 Accuracy 0.562 # unsupported 24 Precision of recognizing NI L 0.160 # inexact 13 Recall of recognizing NI L 0.400

1

# wrong 144 # correct 225 Accuracy 0.545 # unsupported 20 Precision of recognizing NI L 0.158 # inexact 12 Recall of recognizing NI L 0.767

2

# wrong 156

Page 23: QUALIFIER in TREC-12 QA Main Task

Outline

IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work

Page 24: QUALIFIER in TREC-12 QA Main Task

List System Overview

Q u es t io n(d efin i t io n ,

facto id ,l i s t )

o r igin a l que r yt e r m s

O n to lo g y

A n s w er

Q u es tio n An aly s is

Q ue st io n C la ssif ic a t io n

Q u er y P ar s in g

q c las s

Q A E v en t An aly s is

W eb P re-ret ri ev alD o cu m en ts

S n ip p e ts

W o r d N et

S t ru ctu red Q u ery

D o c u m en t R e tr iev a l

T R E CC o r p u s

An s w er E x tr ac t io n

N am ed E n tity

A n s w er J u s t ificat io n

c an o n ic a liza t io nr es o lu tio n

P as s ag e R etr iev a l

As s o c ia t io n R u lesN S en ten ce W in d o w

An ap h o r aR es o lu tio n

D o c u m en ts

R efin ed D o cu m en ts

An s w er S e lec tio n

S C R

Page 25: QUALIFIER in TREC-12 QA Main Task

List Subsystem

Multiple Answers from Same Paragraph Canonicalization Resolution

Unique answer “the States” , “USA”, “United States”, etc

Pattern-based Answer Extraction <same_type_NE>, <same_type_NE> and

<same_type_NE> + verb … … include: <same_type_NE>, <same_type_NE>,

<same_type_NE> … “list of …” “top” + number + adj-superlative

Page 26: QUALIFIER in TREC-12 QA Main Task

List Results

Average precision 0.568 Average recall 0.264

nusmmlr1 nusmmlr2 nusmmlr3 Average F1 0.317

Page 27: QUALIFIER in TREC-12 QA Main Task

Outline

IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work

Page 28: QUALIFIER in TREC-12 QA Main Task

System Overview

Q u es t io n(d efin i t io n ,

facto id ,l i s t )

o r igin a l que r yt e r m s

O n to lo g y

A n s w er

Q u es tio n An aly s is

Q ue st io n C la ssif ic a t io n

Q u er y P ar s in g

q c las s

Q A E v en t An aly s is

W eb P re-ret ri ev alD o cu m en ts

S n ip p e ts

W o r d N et

S t ru ctu red Q u ery

D o c u m en t R e tr iev a l

T R E CC o r p u s

An s w er E x tr ac t io n

N am ed E n tity

A n s w er J u s t ificat io n

c an o n ic a liza t io nr es o lu tio n

P as s ag e R etr iev a l

As s o c ia t io n R u lesN S en ten ce W in d o w

An ap h o r aR es o lu tio n

D o c u m en ts

R efin ed D o cu m en ts

An s w er S e lec tio n

S C R

Page 29: QUALIFIER in TREC-12 QA Main Task

Definition SubsystemI nput:

Rel evantSentences

Defi ni ti onalPattern

Reposi tory

SentenceRanki ng

* - - - - - - - -- - - - - - - -* - - - - - - -- - - - - - - - - -

……-- - - - - - - -* - - - - - - - -- - - - - - - - - - -* - - - - - - -

Stati sti csfor words i nthe sentences

Constructqueri es

Web

WebSni ppets

Most co-occurri ngwords i n Web

sni ppets.

Sentence Sel ecti on(Progressi ve MMR) Defi ni t i on

Page 30: QUALIFIER in TREC-12 QA Main Task

Definition Subsystem

Pre-processing document filter anaphora resolution sentence “positive set” and “negative set”

Sentence Ranking Sentence weighting in Corpus

Sentence weighting in Web

Overall weighting :

)))(

#1log()(1log()(

wCorpusSF

SentencesNegativewCorpusSFsWeight

NegativeswPositiveCorpus

w Positive

Web wSFCorpus

ntencesPositiveSewSFWebsWeight )

)(_

#1log())(_1log()(

WebCorpus WeightWeightsWeight )1()(

Page 31: QUALIFIER in TREC-12 QA Main Task

Definition Subsystem

Answer Generation (Progressive Maximal Margin Relevance)

1. All sentences are ordered in descending order by weights.

2. Add the first sentence to the summary.3. Examine the following sentences.

If Weight(stc)- Weight(next_stc) >avg_sim(stc), Add next_stc to summary;

4. Go to Step 3) till the length limit of the target summary is satisfied.

Page 32: QUALIFIER in TREC-12 QA Main Task

Definition Results

We empirically set the length of the summary for People and Objects based on question classification results.

Run # # sentence Algorithm Result

1 People:12 Object: 10

Full sentences 0.471

2 People:12 Object: 10

Text f ragments 0.479

3 People:10 Object:8

Text f ragments 0.460

Page 33: QUALIFIER in TREC-12 QA Main Task

Outline

IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work

Page 34: QUALIFIER in TREC-12 QA Main Task

Overall Performance

nusmmlr1 0.471 nusmmlr2 0.479 nusmmlr3 0.460

Page 35: QUALIFIER in TREC-12 QA Main Task

Conclusion and Future Work

Conclusion Event-based Question Answering Factoid question and list questions explore the power of Event-

based QA Definition questions answering combines IR and Summarization Use Ontology to boost the performance of our NE and answer

justification modules Future Work

Give a formal proof of our QA event hypothesis Working towards an online question answering system Interactive QA Analysis and opinion questions VideoQA – question answering on news video