Question Answering using Enhanced Lexical Semantic Modelsbenking/resources/reading_group/... · 2013. 10. 25. · Question Answering using Enhanced Lexical Semantic Models Wen-Tau

Question Answering using Enhanced Lexical Semantic Models

Wen-Tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastuniak

ACL 2013

October 23rd, 2013U.Mich. NLP Reading Group

Presented by V.G.Vinod Vydiswaran ([email protected])

Friday, October 25, 13

Who won the best actor Oscar in 1973?

A1: Jack Lemmon won the Academy Award for Best Actor for Save the Tiger (1973)

A2: Oscar winner Kevin Stacey said that Jack Lemmon is remembered as always making time for other people.

Answer selection is a key step to address QAConceptually, a semantic matching problemSemantic structure matching is one approach


Latent word-alignment view

Task is to classify a question/sentence pairWords are “aligned” based on similarityMultiple functions can be used for this purpose

What is the fastest car in the world?

The Jaguar XJ220 is the dearest, fastest, and most sought after car on the planet.


Lexical Semantic Models

Synonymy and AntonymyPILSA model (Yih et al., 2012)

Hypernymy and HyponymyProbase (Wu et al., 2012)

Semantic word similarityBased on three vector space models


1. Synonymy and Antonymy

PILSA: Polarity Inducing Latent Semantic AnalysisSigned d-by-n co-occurrence matrix

d: number of word groups, n: vocabulary size

each element: tf-idf of corresponding word in the group

Antonyms given a negative value

Low-rank approximation derived by singular-value decomposition

Synonym/Antonym: Cosine score between column vectorsLearnt over Encarta thesaurus + discriminative projection matrix training method


2. Hypernymy and Hyponymy

Limitations of WordNetProbase: Automatically extracted connections between 2.7M concepts by applying Hearst patterns to 1.68B webpages

Probabilistic value for each relation based on co-occurrence

What color is Saturn?Saturn is a giant gas planet with brown and beige clouds.

Who wrote Moonight Sonata?Ludwig van Beethoven composed the Moonlight Sonata in 1801.


3. Semantic word similarity

Vector space models based on distributional similarityThree vector space models

Wikipedia contexts (Yih and Qazvinian, 2012)

Recurrent neural network language models (RNNLM) (Mikolov et al., 2012)

640-dim RNNLM vectors trained on Broadcast news corpus

Concept projection over click-through data (Gao et al., 2011)


Bag of words model:

Learning latent structures (LCLR; Chang et al., 2010):

Matching models

�avgj (q, s) =1

mn

X

wq2Vq

ws2Vs

�j(wq, ws)

�

maxj (q, s) = max

wq2Vq

ws2Vs

�j

(wq

, ws

)

argmax

h✓

T�(x, h)

✓

T�(x) min

✓

1

2

k✓k2 + C

X

i

⇠

2i

s.t. ⇠i � 1� yi max

h✓

T�(x, h)

x = (q, s);Vq = {wq1 , wq2 , . . . , wqm};Vs = {ws1 , ws2 , . . . , wsm}


Evaluation Setup

Derived from TREC-QA by Wang et al. (2007)~33 candidate sentences per questionTraining: 5,919 question/sentence pairs from TREC 8-12, manually labeledDev / Test: 1,374 / 1,866 pairs from TREC-13 Candidate sentences over 40 words removedEvaluation measure: MAP and MRR


Baselines

Random: Give random score to each candidate sentenceWord Count: Word overlap excluding stopwordsWeighted Word Count: Words weighted with idf of question word3 existing methods, primary on tree structure:

Syntax-driven dependency-tree matching (Wang et al., 2007)

Quasi-synchronous grammar with Tree-edit CRF model (Wang and Manning, 2010)Tree-kernel function between dependency trees (Hielman and Smith, 2010)


Performance of baseline systems

Best MAP 0.61, Best MRR 0.70

System MAP MRRWang et al. (2007)Wang and Manning (2010)Heilman and Smith (2010)

0.603 0.685

0.595 0.695

0.609 0.692


Simple baseline results

Baseline systems: MAP 0.609, MRR 0.695

Baseline

DevDev TestTest

MAP MRR MAP MRR

Random

Word count

Weighted Word count

0.524 0.582 0.471 0.529

0.652 0.722 0.626 0.682

0.711 0.788 0.653 0.707


Adding lexical semantics

I: Identical word matching (+weights)L: Lemma matching (+weights)WN: WordNet synonyms, antonyms, hyper/hyponyms (+weights)LS: Enhanced lexical semantics (+weights)NE: If word part of comparable named entity string (+weights)QW: If question word and named entity compatible


Models evaluated

Unstructured, bag-of-words settingLogistic regression (LR)

Boosted decision trees (BDT)

Structured output settingLCLR with all question words covered


Lexical semantic features help

+8 to +12% from I to All+11% MAP, +12% MRR of LCLR with All over baseline (I with LR)+25.6% rel MAP, +18.8% rel MRR over published baseline systems

Feature setLRLR BDTBDT LCLRLCLR

MAP MRR MAP MRR MAP MRRII+LI+L+WNI+L+WN+LSAll

0.653 0.707 0.632 0.69 0.663 0.7280.674 0.722 0.65 0.692 0.682 0.7270.704 0.771 0.68 0.745 0.732 0.7920.734 0.811 0.752 0.846 0.763 0.8230.737 0.817 0.75 0.845 0.765 0.826

Baseline systems: MAP 0.609; MRR 0.695


Limitation of just word matching

Main sources of error:missing/erroneous entity relationships

lack of robust question analysis

lack of semantic inference

Q: In what film is Gordon Gekko the main character?

S: He received a best actor Oscar in 1987 for his role as Gordon Gekko in “Wall Street”.


Takeaways & Discussion

Looks at a specific step of answer sentence selection in a QA pipelineSystematic analysis of addition of (improved) lexical semantic modelsCharacteristic of the dataset?


Documents

Question Answering using Enhanced Lexical Semantic Modelsbenking/resources/reading_group/... · 2013. 10. 25. · Question Answering using Enhanced Lexical Semantic Models Wen-Tau