Upload
minghui-qiu
View
108
Download
1
Embed Size (px)
DESCRIPTION
Threaded discussion forums provide an important social media platform. Its rich user generated content has served as an important source of public feedback. To automatically discover the viewpoints or stances on hot is-sues from forum threads is an important and useful task. In this paper, we propose a novel latent variable model for viewpoint discov-ery from threaded forum posts. Our model is a principled generative latent variable model which captures three important factors: view-point specific topic preference, user id and user interactions. Evaluation results show that our model clearly outperforms a number of baseline models in terms of both clustering posts based on viewpoints and clustering users with different viewpoints.
Citation preview
A Latent Variable Model for
Viewpoint Discovery
from Threaded Forum Posts
1
Minghui Qiu and Jing Jiang
School of Information System
Singapore Management University
Threaded Forums
• Threaded structure
• With „reply-to‟ relations (User interactions)
• Multiple threads on the same issue
2
Contrastive viewpoints in Threaded Forums
3
Each Coin Has Two Sides
Pro Obama or
Anti Obama?
How to find contrastive viewpoints
from threaded forum posts?
the Chinese athlete Liu Xiang quit the London Olympic game
Task and Method Overview
4
Finding viewpoints for posts
A set of corpus on
one controversial issue
Method• A unified model for finding contrastive viewpoints (two-viewpoint)
from threaded forum posts
• We build our model based on three observations
Finding viewpoints for users
Observation 1: Different Viewpoints Will
Have Different Topic Preference
• Our findings on ``LiuXiang” data set (``Will you
support LiuXiang after he failed in London Olympic
game?‟‟)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
21 34 39 28 22 6 19 31 4 37 14 8 16 12 13 30 17 11 7 18
Support LiuXiang
Against LiuXiang
Topic focus of two viewpoints on “LiuXiang” Data Set
Olympic
hero, sympath
y on his injury
disappointed,
athlete, ad
sponsors
5
• Framing1
– Users with different sentiments/positions would focus on
different aspects of the topic. E.g.:
– For “iPhone” users: “hardware and build”, “siri”, “ios”
– Against “iPhone” users: “physical keyboard”, “android”, “galaxy”
• Model assumption
– Each viewpoint has its own topic distribution
6
1D. Tversky, Amos; Kahneman. The framing of decisions and
the psychology of choice. pages 453–458, 1981.
Observation 1: Different Viewpoints Will
Have Different Topic Preference
• User consistency
– Posts from the same user tend to have the same
viewpoint towards an issue
– A viewpoint can be derived from the set of posts
towards the same issue grouped by the same user ID
• Model assumption
– There is a user-level viewpoint distribution
– For each post by a user, its viewpoint is drawn from
the corresponding user‟s viewpoint distribution
7
Observation 2: the Same User Will Hold
the Same Viewpoint Towards an Issue
Observation 3: User Interactions Reveal
User Viewpoints
• User interaction
– User interaction: a post in reply to another user
– Users with the same viewpoint tend to have positive
interactions among themselves, while with different
viewpoint tend to have negative interactions
• Sample positive and negative interactions
8
• Model assumption
– Interaction polarity is generated based on the
viewpoint of the current post and the viewpoint of
recipient post(s)
Post Id Viewpoint
1 V1
2 V1
3 V1
5 ?
… ?
Id Viewpoint Content
2 v1
User 1 User 2
Y9
Positive Interaction
I agree with your post Dan. Obama
is so …
Observation 3: User Interactions Reveal
User Viewpoints
p(POS):
p(NEG): 1 - p(POS)
Overview of the Model
• A probabilistic model based on three
observations
– Each viewpoint‟s topic preference
– User consistency
– User interaction
10
Related Works
• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)
– A viewpoint-topic model where viewpoint and topic
are orthogonal
– No user interaction
• Cross-Perspective Topic Model (Fang et al.,
WSDM‟12)
– Supervised model
• Subgroup detection
– Mining user opinions (Abu-Jbara et al., ACL‟12)
– User interaction (Hassan et al., EMNLP‟12)
– Does not model viewpoints
11
•U: # of users•N: # of posts•L: # of words•z: a topic label•x: a switch•x=0: w is background word•x=1: w is topical word•y: a viewpoint label•s: a interaction type
A Probabilistic Model
y
UN
L
w
x
z
Y
Viewpoint specific topic distribution
12
s
Interaction type
User-level
viewpoint
distribution
T
Topic specific word distribution
The polarity of interaction type is learnt
beforehand.
Polarity Prediction for Interaction Type
• Supervised learning
– Requiring labeled data
• Unsupervised approach
– Sample sentence: I agree with you
– Finding interaction expressions
• Finding sentences contains mentions of the recipient (user
name or 2nd-person pronoun). E.g. you
• Surrounding words: a text window of 8 words. E.g.: I agree
– Interaction polarity
• Positive if there are more positive sentiment words, otherwise
negative
13
Evaluation
• Data Sets
– English Data Sets
• Three most discussed threads from Abu-Jbara et al., ACL‟12
– Chinese Data Sets
• Three popular controversial issues in TianYaClub (one of the
most popular Chinese online forums)
• Statistics
14
Data Annotation
• Identification of viewpoints
– 150 randomly sampled posts, two annotators
(Cohen‟s kappa agreement ≥ 0.61)
• Identification of user groups
– 150 randomly sampled users, two annotators
(Cohen‟s kappa agreement ≥ 0.70)
To label a user‟s viewpoint is easier
than to label a post‟s viewpoint
15
Baselines
• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)
– A viewpoint-topic model where viewpoint and topic
are orthogonal
• Degenerate variants of our model– UIM: User interaction model (part of our model)
– JVTM: Joint viewpoint-topic model (our model without
interaction)
– JVTM-G: JVTM with a global viewpoint distribution
16
Identification of Viewpoints
• Task
– To identify each post‟s viewpoint
• Results
• Our model significantly
outperforms other models (at
10% significance level)
• Effectiveness of assumptions• Each viewpoint’s topic preference:
JVTM > TAM
• User consistency: JVTM > JVTM-G
• User interaction: JVTM-UI > others
• User interaction is more important
than other factorsAveraged results of the models in
identification of viewpoints
17
Identification of User Groups
• Subgroup detection
– To detect ideological subgroups, i.e.: user groups with
different viewpoints
• Results
• Our model significantly
outperforms other methods (at
10% significance level)
• Effectiveness of assumptions• Each viewpoint’s topic preference:
JVTM > TAM
• User consistency: JVTM > JVTM-G
• User interaction: JVTM-UI > others
Averaged results of the models in
identification of viewpoints
18
Qualitative Analysis
• User interaction network on “will you vote
obama”
Green (left) and white (right) nodes represent users with two
different viewpoints discovered by our model. Red (thin) edges
represent negative interactions while blue (thick) edges represent
positive interactions
More intra-cluster positive interactions and
More inter-cluster negative interactions
19
Qualitative Analysis
• Users with different viewpoints tend to have
different topic focus
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
21 34 39 28 22 6 19 31 4 37 14 8 16 12 13 30 17 11 7 18
Support LiuXiang
Against LiuXiang
Topic focus of two viewpoints on “LiuXiang” Data Set
20
Qualitative Analysis
• Top 4 topics for “supporting LiuXiang” viewpointWord Translation Word Translation Word Translation Word Translation
刘翔 LiuXiang 栏 hurdle 运动员 athlete 第一 first
冠军 champion 伤 injury 奥运会 Olympic 时间 time
赛后 after-game 成绩 record 跟腱Achilles's
tendon奥运 Olympic
田径track and
field摔倒 fall 北京 beijing 获得 achieve
男子 man 13秒 13s 脚 foot 一个 one
最后 finally 手术 surgery 伦敦 london 届 time
刘 liu 决赛 final 田联 IAAF 情况 condition
奥运会 Olympic 英国 Britain 医生 doctor 训练 train
参加 attend 受伤 hurt 上海 Shang Hai 重 heavy
跑 run 赛场 field 记者 reporter 导致 result in
已经 already 断裂 broken 好 good 遗憾 pity
纪录 record 英雄 hero 团队 team 联赛league
matches
12秒 12s 预赛 first heat 夺冠 champion 需要 need
当时 that time 2012年 2012 跳 jump 第二 2nd
退役 retire 罗伯斯 Robles 跑道 report 伟大 great
21
Qualitative Analysis
• Top 4 topics for “against LiuXiang” viewpointWord Translation Word Translation Word Translation Word Translation
帖 post 发自 orgin from 天涯 tianya 天涯 tianya
社区 community 随时 anytime 楼主 poster 抵制 Resist
热点 hot 老板 boss 猫 sneak 骗子 lier
围观 apathetic 政协 CPPCC 妈 F**K 体坛 sports
傻逼 fool 唯金牌论 gold medal only theory 帮 those 水 spam
最 least 微笑 smile 孙子 foolish 提 mention
钱 money 顶 support 啤酒 bear 吃 eat
水军 spam 恶心 nausea 杨 yang 牌 medal
笑 laugh 可口可乐 Coca Cola 全家 whole family 苦笑 bitter smile
骂 scold 喝 drink 别有用心 ulterior motive 高尚 noble
孙子 foolish 笑话 joke 躲 hide 有力 powerful
你们 you 加油 cheer up 歪风 bad tendency 劳民伤财a waste of money and manpower
多么 extremly 脱离 separate 看看 look 黑 spam
有人 someone 枪眼 force of public opinion 滩 those 黄继光 a hero
脸上 face 神位 fame 精神 spirit 神像 fame
22
Summary
23
• Conclusion
• A viewpoint discovery model for threaded forums
• Modeling three observations
• Viewpoint-specific topic distribution (Framing)
– User consistency
– Interplay between user interactions and viewpoints
– Future work
– Document representation: complex lexical units
– A more accurate interaction polarity classifier
– Contrastive viewpoint summarization
– Mining controversial issues and finding viewpoints
24
Thank you
Reference
• [Paul et al., AAAI‟10] Paul, M. J. and Girju, R. (2010). A two-
dimensional topic-aspect model for discovering multi-faceted topics.
In AAAI.
• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al. (2012), Subgroup
detection in ideological discussions. In ACL.
• [Yi Fang et al. WSDM‟12] Yi Fang et al. (2012), Mining contrastive
opinions on political texts using cross-perspective topic model. In
WSDM, pages 63–72.
• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al., (2012). Subgroup
detection in ideological discussions. In ACL.
• [Hassan et al., EMNLP‟12] Hassan et al., (2012). Detecting
subgroups in online discussions by modeling positive and negative
relations among participants. In EMNLP.
25