25
A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts 1 Minghui Qiu and Jing Jiang School of Information System Singapore Management University

13 naacl-a latent variable model-qiu and jiang-slides

Embed Size (px)

DESCRIPTION

Threaded discussion forums provide an important social media platform. Its rich user generated content has served as an important source of public feedback. To automatically discover the viewpoints or stances on hot is-sues from forum threads is an important and useful task. In this paper, we propose a novel latent variable model for viewpoint discov-ery from threaded forum posts. Our model is a principled generative latent variable model which captures three important factors: view-point specific topic preference, user id and user interactions. Evaluation results show that our model clearly outperforms a number of baseline models in terms of both clustering posts based on viewpoints and clustering users with different viewpoints.

Citation preview

Page 1: 13 naacl-a latent variable model-qiu and jiang-slides

A Latent Variable Model for

Viewpoint Discovery

from Threaded Forum Posts

1

Minghui Qiu and Jing Jiang

School of Information System

Singapore Management University

Page 2: 13 naacl-a latent variable model-qiu and jiang-slides

Threaded Forums

• Threaded structure

• With „reply-to‟ relations (User interactions)

• Multiple threads on the same issue

2

Page 3: 13 naacl-a latent variable model-qiu and jiang-slides

Contrastive viewpoints in Threaded Forums

3

Each Coin Has Two Sides

Pro Obama or

Anti Obama?

How to find contrastive viewpoints

from threaded forum posts?

the Chinese athlete Liu Xiang quit the London Olympic game

Page 4: 13 naacl-a latent variable model-qiu and jiang-slides

Task and Method Overview

4

Finding viewpoints for posts

A set of corpus on

one controversial issue

Method• A unified model for finding contrastive viewpoints (two-viewpoint)

from threaded forum posts

• We build our model based on three observations

Finding viewpoints for users

Page 5: 13 naacl-a latent variable model-qiu and jiang-slides

Observation 1: Different Viewpoints Will

Have Different Topic Preference

• Our findings on ``LiuXiang” data set (``Will you

support LiuXiang after he failed in London Olympic

game?‟‟)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

21 34 39 28 22 6 19 31 4 37 14 8 16 12 13 30 17 11 7 18

Support LiuXiang

Against LiuXiang

Topic focus of two viewpoints on “LiuXiang” Data Set

Olympic

hero, sympath

y on his injury

disappointed,

athlete, ad

sponsors

5

Page 6: 13 naacl-a latent variable model-qiu and jiang-slides

• Framing1

– Users with different sentiments/positions would focus on

different aspects of the topic. E.g.:

– For “iPhone” users: “hardware and build”, “siri”, “ios”

– Against “iPhone” users: “physical keyboard”, “android”, “galaxy”

• Model assumption

– Each viewpoint has its own topic distribution

6

1D. Tversky, Amos; Kahneman. The framing of decisions and

the psychology of choice. pages 453–458, 1981.

Observation 1: Different Viewpoints Will

Have Different Topic Preference

Page 7: 13 naacl-a latent variable model-qiu and jiang-slides

• User consistency

– Posts from the same user tend to have the same

viewpoint towards an issue

– A viewpoint can be derived from the set of posts

towards the same issue grouped by the same user ID

• Model assumption

– There is a user-level viewpoint distribution

– For each post by a user, its viewpoint is drawn from

the corresponding user‟s viewpoint distribution

7

Observation 2: the Same User Will Hold

the Same Viewpoint Towards an Issue

Page 8: 13 naacl-a latent variable model-qiu and jiang-slides

Observation 3: User Interactions Reveal

User Viewpoints

• User interaction

– User interaction: a post in reply to another user

– Users with the same viewpoint tend to have positive

interactions among themselves, while with different

viewpoint tend to have negative interactions

• Sample positive and negative interactions

8

Page 9: 13 naacl-a latent variable model-qiu and jiang-slides

• Model assumption

– Interaction polarity is generated based on the

viewpoint of the current post and the viewpoint of

recipient post(s)

Post Id Viewpoint

1 V1

2 V1

3 V1

5 ?

… ?

Id Viewpoint Content

2 v1

User 1 User 2

Y9

Positive Interaction

I agree with your post Dan. Obama

is so …

Observation 3: User Interactions Reveal

User Viewpoints

p(POS):

p(NEG): 1 - p(POS)

Page 10: 13 naacl-a latent variable model-qiu and jiang-slides

Overview of the Model

• A probabilistic model based on three

observations

– Each viewpoint‟s topic preference

– User consistency

– User interaction

10

Page 11: 13 naacl-a latent variable model-qiu and jiang-slides

Related Works

• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)

– A viewpoint-topic model where viewpoint and topic

are orthogonal

– No user interaction

• Cross-Perspective Topic Model (Fang et al.,

WSDM‟12)

– Supervised model

• Subgroup detection

– Mining user opinions (Abu-Jbara et al., ACL‟12)

– User interaction (Hassan et al., EMNLP‟12)

– Does not model viewpoints

11

Page 12: 13 naacl-a latent variable model-qiu and jiang-slides

•U: # of users•N: # of posts•L: # of words•z: a topic label•x: a switch•x=0: w is background word•x=1: w is topical word•y: a viewpoint label•s: a interaction type

A Probabilistic Model

y

UN

L

w

x

z

Y

Viewpoint specific topic distribution

12

s

Interaction type

User-level

viewpoint

distribution

T

Topic specific word distribution

The polarity of interaction type is learnt

beforehand.

Page 13: 13 naacl-a latent variable model-qiu and jiang-slides

Polarity Prediction for Interaction Type

• Supervised learning

– Requiring labeled data

• Unsupervised approach

– Sample sentence: I agree with you

– Finding interaction expressions

• Finding sentences contains mentions of the recipient (user

name or 2nd-person pronoun). E.g. you

• Surrounding words: a text window of 8 words. E.g.: I agree

– Interaction polarity

• Positive if there are more positive sentiment words, otherwise

negative

13

Page 14: 13 naacl-a latent variable model-qiu and jiang-slides

Evaluation

• Data Sets

– English Data Sets

• Three most discussed threads from Abu-Jbara et al., ACL‟12

– Chinese Data Sets

• Three popular controversial issues in TianYaClub (one of the

most popular Chinese online forums)

• Statistics

14

Page 15: 13 naacl-a latent variable model-qiu and jiang-slides

Data Annotation

• Identification of viewpoints

– 150 randomly sampled posts, two annotators

(Cohen‟s kappa agreement ≥ 0.61)

• Identification of user groups

– 150 randomly sampled users, two annotators

(Cohen‟s kappa agreement ≥ 0.70)

To label a user‟s viewpoint is easier

than to label a post‟s viewpoint

15

Page 16: 13 naacl-a latent variable model-qiu and jiang-slides

Baselines

• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)

– A viewpoint-topic model where viewpoint and topic

are orthogonal

• Degenerate variants of our model– UIM: User interaction model (part of our model)

– JVTM: Joint viewpoint-topic model (our model without

interaction)

– JVTM-G: JVTM with a global viewpoint distribution

16

Page 17: 13 naacl-a latent variable model-qiu and jiang-slides

Identification of Viewpoints

• Task

– To identify each post‟s viewpoint

• Results

• Our model significantly

outperforms other models (at

10% significance level)

• Effectiveness of assumptions• Each viewpoint’s topic preference:

JVTM > TAM

• User consistency: JVTM > JVTM-G

• User interaction: JVTM-UI > others

• User interaction is more important

than other factorsAveraged results of the models in

identification of viewpoints

17

Page 18: 13 naacl-a latent variable model-qiu and jiang-slides

Identification of User Groups

• Subgroup detection

– To detect ideological subgroups, i.e.: user groups with

different viewpoints

• Results

• Our model significantly

outperforms other methods (at

10% significance level)

• Effectiveness of assumptions• Each viewpoint’s topic preference:

JVTM > TAM

• User consistency: JVTM > JVTM-G

• User interaction: JVTM-UI > others

Averaged results of the models in

identification of viewpoints

18

Page 19: 13 naacl-a latent variable model-qiu and jiang-slides

Qualitative Analysis

• User interaction network on “will you vote

obama”

Green (left) and white (right) nodes represent users with two

different viewpoints discovered by our model. Red (thin) edges

represent negative interactions while blue (thick) edges represent

positive interactions

More intra-cluster positive interactions and

More inter-cluster negative interactions

19

Page 20: 13 naacl-a latent variable model-qiu and jiang-slides

Qualitative Analysis

• Users with different viewpoints tend to have

different topic focus

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

21 34 39 28 22 6 19 31 4 37 14 8 16 12 13 30 17 11 7 18

Support LiuXiang

Against LiuXiang

Topic focus of two viewpoints on “LiuXiang” Data Set

20

Page 21: 13 naacl-a latent variable model-qiu and jiang-slides

Qualitative Analysis

• Top 4 topics for “supporting LiuXiang” viewpointWord Translation Word Translation Word Translation Word Translation

刘翔 LiuXiang 栏 hurdle 运动员 athlete 第一 first

冠军 champion 伤 injury 奥运会 Olympic 时间 time

赛后 after-game 成绩 record 跟腱Achilles's

tendon奥运 Olympic

田径track and

field摔倒 fall 北京 beijing 获得 achieve

男子 man 13秒 13s 脚 foot 一个 one

最后 finally 手术 surgery 伦敦 london 届 time

刘 liu 决赛 final 田联 IAAF 情况 condition

奥运会 Olympic 英国 Britain 医生 doctor 训练 train

参加 attend 受伤 hurt 上海 Shang Hai 重 heavy

跑 run 赛场 field 记者 reporter 导致 result in

已经 already 断裂 broken 好 good 遗憾 pity

纪录 record 英雄 hero 团队 team 联赛league

matches

12秒 12s 预赛 first heat 夺冠 champion 需要 need

当时 that time 2012年 2012 跳 jump 第二 2nd

退役 retire 罗伯斯 Robles 跑道 report 伟大 great

21

Page 22: 13 naacl-a latent variable model-qiu and jiang-slides

Qualitative Analysis

• Top 4 topics for “against LiuXiang” viewpointWord Translation Word Translation Word Translation Word Translation

帖 post 发自 orgin from 天涯 tianya 天涯 tianya

社区 community 随时 anytime 楼主 poster 抵制 Resist

热点 hot 老板 boss 猫 sneak 骗子 lier

围观 apathetic 政协 CPPCC 妈 F**K 体坛 sports

傻逼 fool 唯金牌论 gold medal only theory 帮 those 水 spam

最 least 微笑 smile 孙子 foolish 提 mention

钱 money 顶 support 啤酒 bear 吃 eat

水军 spam 恶心 nausea 杨 yang 牌 medal

笑 laugh 可口可乐 Coca Cola 全家 whole family 苦笑 bitter smile

骂 scold 喝 drink 别有用心 ulterior motive 高尚 noble

孙子 foolish 笑话 joke 躲 hide 有力 powerful

你们 you 加油 cheer up 歪风 bad tendency 劳民伤财a waste of money and manpower

多么 extremly 脱离 separate 看看 look 黑 spam

有人 someone 枪眼 force of public opinion 滩 those 黄继光 a hero

脸上 face 神位 fame 精神 spirit 神像 fame

22

Page 23: 13 naacl-a latent variable model-qiu and jiang-slides

Summary

23

• Conclusion

• A viewpoint discovery model for threaded forums

• Modeling three observations

• Viewpoint-specific topic distribution (Framing)

– User consistency

– Interplay between user interactions and viewpoints

– Future work

– Document representation: complex lexical units

– A more accurate interaction polarity classifier

– Contrastive viewpoint summarization

– Mining controversial issues and finding viewpoints

Page 24: 13 naacl-a latent variable model-qiu and jiang-slides

24

Thank you

Page 25: 13 naacl-a latent variable model-qiu and jiang-slides

Reference

• [Paul et al., AAAI‟10] Paul, M. J. and Girju, R. (2010). A two-

dimensional topic-aspect model for discovering multi-faceted topics.

In AAAI.

• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al. (2012), Subgroup

detection in ideological discussions. In ACL.

• [Yi Fang et al. WSDM‟12] Yi Fang et al. (2012), Mining contrastive

opinions on political texts using cross-perspective topic model. In

WSDM, pages 63–72.

• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al., (2012). Subgroup

detection in ideological discussions. In ACL.

• [Hassan et al., EMNLP‟12] Hassan et al., (2012). Detecting

subgroups in online discussions by modeling positive and negative

relations among participants. In EMNLP.

25