29
Adaptive Subjective Triggers for Opinionated Document Retrieval Kazuhiro Seki Organization of Advanced Science & Technology Kobe University Kuniaki Uehara Graduate School of Engineering, Kobe University 2/10/2009 1

Adaptive Subjective Triggers for Opinionated Document Retrieval

  • Upload
    perry

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Adaptive Subjective Triggers for Opinionated Document Retrieval. Kazuhiro Seki Organization of Advanced Science & Technology Kobe University Kuniaki Uehara Graduate School of Engineering, Kobe University 2 /10/2009. Background. Increasing user-generated contents (UGC) on the web - PowerPoint PPT Presentation

Citation preview

Page 1: Adaptive Subjective Triggers for Opinionated Document Retrieval

1

Adaptive Subjective Triggers for Opinionated Document Retrieval

Kazuhiro SekiOrganization of Advanced Science & TechnologyKobe University

Kuniaki UeharaGraduate School of Engineering, Kobe University

2/10/2009

Page 2: Adaptive Subjective Triggers for Opinionated Document Retrieval

2

Background

• Increasing user-generated contents (UGC) on the web– often contain personal subjective opinions

• Can be helpful for personal/corporate decision making → demands to retrieve personal opinions for a given entity

• Traditional IR aims to find documents relevant to a given topic (entity)– not concerned with subjectivity

• Aim: Retrieve documents not only pertinent to a given entity but also containing subjective opinions

Page 3: Adaptive Subjective Triggers for Opinionated Document Retrieval

3

An (existing) approach

• Lexicon-based (Mishne, 2006; Zhang et al., 2008; etc.)– Look for subjective words/phrases

• “like” conveys favorable feelings– “I like the movie.”

– Potential drawback• Only words/phrases separate from context do not indicate

subjectivity– “It looks like a cat.”– “She likes singing.”

Page 4: Adaptive Subjective Triggers for Opinionated Document Retrieval

4

Another approach considering wider context

• n-gram language model– estimate word occurrence probabilities based on prior

context or history, i.e., (n – 1) words• bigram: P(wi|wi–1) • trigram: P(wi|wi–2,wi–1)

– Generally, n is set to 2 to 3

Page 5: Adaptive Subjective Triggers for Opinionated Document Retrieval

5

Trigger models (Lau et al., 1993)

• Incorporate long distance dependency that cannot be handled by n-gram models

• Trigger pairs– word pairs such that one tends to bring about the

occurrence of the other• nor → either (syntactic dependency)• memory → GB (semantic dependency)

• Used by linearly interpolating with an n-gram model(1–λ)·PB(w|h) + λ·PT(w|h)

trigger modeln-gram model

Page 6: Adaptive Subjective Triggers for Opinionated Document Retrieval

6

Identifying trigger pairs (Tillmann et al. , 1996)

corpus

n-gram modelP(w|h)

vocabulary

potentialtrigger pairs

trigger model

PT(w|h)

extended model

PE(w|h)

log likelihood difference

Δa→b=∑i {logPE(wi|hi) – logP(wi|hi)}

each paira → b

evaluation

start

When P(b|h) < t→ low level

triggers

Page 7: Adaptive Subjective Triggers for Opinionated Document Retrieval

7

Building trigger model PT

1. For each identified trigger pair (a→b), compute their association score α(b|a) based on their co-occurrences

2. Define a trigger model PT by using α(·)

average association score betweenwords in history h and word w

Page 8: Adaptive Subjective Triggers for Opinionated Document Retrieval

8

Subjective trigger model

• Assumptions– Personal subjective opinion consists of two main

components• Subject of the opinion (e.g, “I”, “you”) or the object the opinion is

about (e.g., “The Curious Case of Benjamin Button”) • Subjective expression (e.g., “like”, “feel”)• Treat them as triggering and triggered words, respectively

– Triggering words are expressed as pronouns• Empirical finding

– Proximity of pronouns and subjective expressions to objects is an effective measure of opinionatedness (Zhou et al., 2007; Yang et al., 2007)

Page 9: Adaptive Subjective Triggers for Opinionated Document Retrieval

9

Identifying “subjective” trigger pairs

• Pronouns considered– I, my, you, it, its, he, his, she, her, we, our, they, their, this

• History h: preceding words in the same sentence• Corpus: 5000 customer reviews from Amazon.com

Page 10: Adaptive Subjective Triggers for Opinionated Document Retrieval

10

Identifying “subjective” trigger pairs (cont.)

• Low level trigger (P(w|h) < t) causes the problem– Penalize frequent w with infrequent history h

Page 11: Adaptive Subjective Triggers for Opinionated Document Retrieval

11

reranking

documentsd

query q

documentsd

Opinion retrieval

• Probability that d is relevant to q AND subjective– product of PINM(q|d) and

PE(d)=∏i PE(wi|hi)– PE(d) is smaller for longer d– PINM(q|d) and PE(d) may have

largely different variances• Normalize PE(d) by length m &

take weighted sum of logssubj. languagemodel PE(w|h)

PINM(q|d)

IR by INM

Page 12: Adaptive Subjective Triggers for Opinionated Document Retrieval

12

Dynamic model adaptation

• Motivation– Language models created from Amazon reviews may not

be effective for some types of entities• Procedure

1. Carry out keyword search for a given topic2. Use k top ranked blog posts to identify new trigger pairs

(a→b) and compute α’(·)3. Update trigger model by using the new trigger pairs

association scores for new triggers

Page 13: Adaptive Subjective Triggers for Opinionated Document Retrieval

13

Empirical evaluation

• Data– TREC Blog track test collection 2006

• 3 million blog posts crawled from Dec 2005 to Feb 2006• 50 “topics” (user information needs)• Relevant & opinionated posts are explicitly labeled

• Two types of assessment– Evaluation of the language models– Their effects on opinion retrieval

Page 14: Adaptive Subjective Triggers for Opinionated Document Retrieval

14

Evaluation of language models

• Perplexity– Uncertainty of language model L in predicting word

sequence (d = w1,…,wm)

• Created two hypothetical documents from the Blog track collection– concatenate all the opinionated posts → dO

– all the relevant (but non-opinionated) posts → dN

Page 15: Adaptive Subjective Triggers for Opinionated Document Retrieval

15

• Higher order n-grams monotonically decrease perplexity irrespective of language models and document types

• Opinionated document dO leads to lower perplexity• Subjective language model PE produces lower perplexity than

n-gram model PB

Perplexity Results

Page 16: Adaptive Subjective Triggers for Opinionated Document Retrieval

16

Relation between parameter β and MAP

+22.0%

Page 17: Adaptive Subjective Triggers for Opinionated Document Retrieval

17

Improvement for individual topics

Page 18: Adaptive Subjective Triggers for Opinionated Document Retrieval

18

Analysis on individual topics

• Topics with notable improvement– “MacBook Pro”. Laptop (+0.22)– “Heineken”. Company and brand names (+0.20)– “Shimano”. Company and brand names (+0.19)– “Board chess”. Board game (+0.13)– “Zyrtec”. Medication (product name) (+0.12)– “Mardi Gras”. Final day of carnival (+0.11)

• Most of them are products– Model learned from Amazon reviews is effective for

products in general, including beer and medication– Also effective for other types of entities

Page 19: Adaptive Subjective Triggers for Opinionated Document Retrieval

19

Analysis on individual topics (cont.)

• Topics with performance decline– “Jim Moran”. Congressman (–0.15)– “World Trade Org.”. International organization (–0.05)– “Cindy Sheehan”. Anti-war activist (–0.03)– “Ann Coulter”. Political commentator (–0.01)– “West Wing”. TV drama set in the white house (–0.01)– “Sonic food industry”. Fast-food restaurant chain (–0.01)

• Politics and organizations are difficult to improve?– Bruce Bartlett (+0.07), Jihad (+0.06), McDonalds (+0.03),

Qualcomm (+0.02)

Page 20: Adaptive Subjective Triggers for Opinionated Document Retrieval

20

Results for dynamic model adaptation

• Moderately improved performance• For “Zyrtec”, AP improved by 47.7%

Page 21: Adaptive Subjective Triggers for Opinionated Document Retrieval

21

Results for model adaptation for difficult topics

• For most topics, AP slightly but consistently improved

Page 22: Adaptive Subjective Triggers for Opinionated Document Retrieval

22

Conclusions

• Proposed subjective trigger models reflecting subjective opinions– Two assumptions + a modification to low-level triggers

• Combined with an IR model for opinion retrieval– 22.0% improvement over INM in MAP– Effective for most topics, slight drop for topics concerning

politics and organizations• Dynamic model adaptation

– Positive effect overall (+25.0% over initial search)– Moderately effective for politics- and organization-related

topics

Page 23: Adaptive Subjective Triggers for Opinionated Document Retrieval

23

Future work

• Use of a larger corpus of customer reviews• Use of labeled data in the blog track test collection• Refine the approach to model adaptation

Page 24: Adaptive Subjective Triggers for Opinionated Document Retrieval

24

ReferencesMishne, G.: Multiple Ranking Strategies for Opinion Retrieval in Blogs, Proceesings of the 15th Text

Retrieval Conference (2006).

Zhang, M. and Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp.411.418 (2008).

Lau, R., Rosenfeld, R. and Roukos, S.: Trigger-based language models: a maximum entropy approach, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, pp.45.48 (1993).

Tillmann, C. and Ney, H.: Grammatical Interference: Learning Syntax from Sentences, Lecture Notes in Computer Science, chapter Selection criteria for word trigger pairs in language modeling, pp.95.106, Springer Berlin / Heidelberg (1996).

Zhou, G., Joshi, H. and Bayrak, C.: Topic Categorization for Relevancy and Opinion Detection, Proceedings of the 16th Text Retrieval Conference (2007).

Yang, K., Yu, N. and Zhang, H.: WIDIT in TREC 2007 Blog Track: Combining Lexicon- Based Methods to Detect Opinionated Blogs, Proceedings of the 16th Text Retrieval Conference (2007).

Zhang, W., Yu, C. and Meng, W.: Opinion retrieval from blogs, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 831.840 (2007).

Page 25: Adaptive Subjective Triggers for Opinionated Document Retrieval

25

QUESTIONS?

Page 26: Adaptive Subjective Triggers for Opinionated Document Retrieval

26

Comparative experiments

2006

TREC best 0.1885

Zhang et al. 0.2726

Ours w/ our baseline 0.2398

Ours w/ stronger baseline (0.3022)

0.3221

Page 27: Adaptive Subjective Triggers for Opinionated Document Retrieval

27

Comparative experiments

2007

TREC best 0.4341

TREC 2nd 0.3453

TREC 3rd 0.3264

Ours w/ our baseline (0.2508)

0.3072

Ours w/ stronger baseline (0.3784)

0.4054

Page 28: Adaptive Subjective Triggers for Opinionated Document Retrieval

28

Comparative experiments

2008. Same baseline

TREC best 0.4067

TREC 2nd 0.4006

TREC 3rd 0.3964

Ours w/ stronger baseline (0.3822)

0.3996

Page 29: Adaptive Subjective Triggers for Opinionated Document Retrieval

29

Comparative experiments (polarity task)

2008. Same baseline

TREC best (ours) 0.1448

TREC 2nd 0.1348

TREC 3rd 0.1129