View
214
Download
0
Embed Size (px)
Citation preview
Discovering Leaders from Community Actions
Amit Goyal1
Francesco Bonchi2
Laks V.S. Lakshmanan1
Oct 27, 2008 1 2
3
Word of Mouth and Viral Marketing We are more influenced
by our friends than strangers
68% of consumers consult friends and family before purchasing home electronics (Burke 2003)
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
4
Viral Marketing
Also known as Target Advertising
Initiate chain reaction by Word of mouth effect
Low investments, maximum gain
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
5
Viral Marketing as an Optimization Problem Given: Network with
influence probabilities Problem: Select top-k
leaders such that by targeting them, the spread of influence is maximized
Hao Ma et al 2008, Domingos et al 2001, Richardson et al 2002, Kempe et al 2003
How to calculate true influence probabilities?
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
6
A pattern mining approach
We propose a completely different approach based on frequent pattern mining.
We focus on the actions performed by users: Joining a community (as in flickr/facebook community) Rating a song, a movie (as in Y! Music, Y! Movie)
Importance of time in which actions are performed
Assumption: Users can see their friends’ actions
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
7
Our Contributions
Formally define the notion of leaders and its various flavors
Efficient algorithms for extracting these leaders
Demonstrate the utility and scalability of our algorithms, via an extensive set of experiments on a real world dataset Yahoo! Messenger (social graph) Yahoo! Movies rating (actions log)
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
8
Rest of the talk
Framework definition: Influence propagation on the social network Various notions of leaders
Algorithms Experiments Related Work Conclusion
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
10
Input Data (1)
A social network, i.e., an undirected graph G=(V,E) where nodes are users and edges represent social ties.
Users declare their friends. e.g. Facebook, Yahoo! Messenger etc
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
11
Input Data (2)
An actions log sorted in chronological order, i.e., a relation
Actions(User, Action, Time)
Example: Jack joined Yoga community at time 5
Assumption:Users can see their friends actions (feeds)
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
12
Action Propagation
Jack Jill
Mary
Jack and Jill are friendsJack and Mary are friendsAction is “Joining the Yoga community”
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Action Propagated from Jack to JillAction propagated from Jack to Mary
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
3 time units
995 time units
13
Propagation Graph
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Mary
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
Can we say Mary got influenced by Jack?? NO
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
14
User Influence Graph
When an action propagates from user u to user v,
we may think of v
being influenced by u
Influence should decay in time
Size of influence graph << Size of PG
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Mary
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
Propagation Graph
User Influence Graph for Jack
15
Leaders – first definition
Who should be a leader? For an action, should influence sufficiently large number of users ( >ψ ) For an action, should influence these users in a reasonable amount of
time ( <π ) Should act as a leader in sufficiently large number of actions ( >σ )
If ψ= 2, π = 15, σ = 1then, both Jack and Jill are leaders
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
3
74
7
3995
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Mary
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
3
7
7Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Mary
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Mary
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Joined YogaCommunity at time 1000
Mary
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
16
Tribe Leader
A leader may influence different users for different actions
What if a leader lead a fixed set of users for different actions?
We call these leaders as Tribe Leaders
Can be considered as small communities
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
jack
A1 A3A2
A1, A2 and A3 are 3 different actions
17
Additional Constraint: Genuineness It may happen that one
user acts as a leader but in concrete he is always a follower of the other leaders
We want to avoid this kind of fake leaders.
gen(Jill) = 1/3 Another constraint:
confidence
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
Tom
Jill
Jack
A1 A3
A2
A1 A2
A1, A2 and A3 are 3 different actions
19
Algorithms: Overview
Assumptions: Social graph is huge – millions of nodes Actions log is huge – millions of tuples For an action, size of user Influence Graph <<
size of Propagation Graph for all users Our algorithms are able to extract the patterns
(leaders and tribe leaders) in no more than one scan of the action log table.
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
20
Algorithms: Overview
Scan the action log table by means of a window of sizeπbackward in time, i.e., starting from the most recent timestamp (bottom of the table if we assume tuples to be ordered by time).
Efficiently compute the influence matrix, i.e., a matrix Users x Actions IMπ(u, a) represents number of users, influenced by u w.r.t. action a
within timeπ Compute leaders from IM
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
Jack Jill
Joey
Joined YogaCommunity at time 5
Joined YogaCommunity at time 8
Ben
Joined YogaCommunity at time 12
Joined YogaCommunity at time 15
IM10(Jack, “joining yoga community”) = 3
21
Computing Influence Matrix (1)
We use a bit vector to track which users are influenced by a given user. Updated incrementally
Locking mechanism using another bit vector 0 => free bit; 1 => occupied bit
Node to bit index mapping stored in a queue Bits must be dynamically allocated.
S
R
T
W
V
Node InfVec
R 01010111
S 01000110
T 00010110
W 00000110
V 00000100
(V,2) (W,1) (T,4) (S,6) (R,0)
HeadQueue
01010111
Lock bit Vector
Time window on propagation graph
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
22
Computing Influence Matrix (2)
Slide up the current window – delete node V Delete the entry from queue Update the lock Update influence vectors
S
R
T
W
V
Node InfVec
R 01010011
S 01000010
T 00010010
W 00000010
V 0000010001010011
Lock bit Vector
(V,2) (W,1) (T,4) (S,6) (R,0)
HeadQueue
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
(V,2) (W,1) (T,4) (S,6) (R,0)
01010111
Lock bit Vector
Node InfVec
R 01010111
S 01000110
T 00010110
W 00000110
V 00000100
Time window on propagation graph
23
Computing Influence Matrix (3)
New node P added Issue a lock, add entry to the queue Compute its Influence Vector by propagation Number of followers of P = 4 IM(P,a) = 4
S
R
T
W
Node InfVec
P 01010111
R 01010011
S 01000010
T 00010010
W 00000010
(W,1) (T,4) (S,6) (R,0) (P,2)
HeadQueue
01010111
Lock bit Vector
P
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
(W,1) (T,4) (S,6) (R,0)
01010011
Lock bit Vector
Node InfVec
R 01010011
S 01000010
T 00010010
W 00000010
Time window on propagation graph
24
Mining Tribe Leaders
Influence Matrix not enough We use influence cube: Users x Actions x Users
ICπ(u,a,v) = 1, when user v is influenced by user u for action a within time π
We do not explicitly compute the whole cube due to sparsity.
Problem same as discovering existence of frequent itemsets of size larger than a given threshold
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
25
Algorithms - Final Comments
The only truly mandatory threshold is π(time threshold)
Influence Matrix: O(TAn2) in bit level operations T = total number of tuples in action log A = total number of distinct actions n = maximum number of nodes visible in any position of the
time window n << N, where N is the total number of users
Tribe Leaders: Influence Cube: O(TAn2) Finding existence of frequent itemsets: exponential in
number of followers But very fast due to optimizations (Bonchi 2003)
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
27
Data Preparation
Data Social graph: Yahoo! Instant Messenger Actions log: Yahoo! Movies
Action = user u rated movie m at time t joined through common users identifiers
Started from Yahoo! Instant Messenger subgraph of “most active” users (110M nodes) and 21M ratings from Yahoo! Movies.
Ended with 217.5K nodes, 221.4K edges and 1.8M ratings.
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
28
Data characteristics: connected components
Giant component94K Users (43.2% of connected users)
Total 46,650 connected components
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
29
Leaders Vs. Tribe leaders
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
π – threshold on timeσ – threshold on number of actionsψ – threshold on number of influenced users
30
Number of leaders found
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
π – threshold on timeσ – threshold on number of actionsψ – threshold on number of influenced users
31
Run-time
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
π – threshold on timeσ – threshold on number of actionsψ – threshold on number of influenced users
32
Genuineness: an almost binary concept!
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
33
Top-10 tribe leaders w.r.t. tribe size
• Tribe leaders exhibit high confidence.
• Tribe leaders with low genuineness were found dominated by other tribe leaders present in the tables.
• We found many users acting as leader in many actions but not being a tribe leader.
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
34
Related Work (1)
Identifying influential users Domingos et al 2001, Richardson et al 2002,
Kempe et al 2005 Identifying influential bloggers
Agarwal et al 2008 Identifying communities in Social Networks
Hoproft et al 2003, Kumar et al 2006, Backstrom et al 2006, Tantipathananadh et al 2007, Huang et al 2008, Friedland at el 2007
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
35
Related Work (2)
Influence and Correlation in Social Networks Aris Anagnostopoulos et al 2008
Revenue maximization Hartline et al 2008
Near optimal sensor placement for outbreak detection Leskovec et al 2007
Heat Diffusion Model Hao Ma et al 2008 (CIKM)
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
36
Conclusions
Proposed framework based on frequent pattern mining for discovering leaders in social networks
Formally define the problem of extracting leaders from social graph and actions log. Various notions of leader, tribe leader Their confidence and genuine variants
Efficient algorithms for extracting leaders of various flavors Just one pass over the actions log table
Demonstrate the utility and scalability of our algorithms, via an extensive set of experiments on a real world dataset Yahoo! Messenger (social graph) Yahoo! Movies rating (actions log)
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
37
Ongoing/Future Work
Gurumine: Pattern Mining System for Discovering Leaders and Tribes (Demo paper to appear in ICDE 2009)
Leadership Cube: What kind of leaders attract what kind of followers for what kind of actions?
Viral Marketing Stronger notions of influence?
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
38
Thanks!
1
3
41
2 3
5
23
13 3
7
4
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
40
Number of leaders found
π – threshold on timeσ – threshold on number of actionsψ – threshold on number of influenced users
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/
41
Additional constraint: confidence Similarly to association rules, we can have a
confidence measure for leaders. Leadership confidence =
# actions in which is a leader / # actions performed Example: Lets say Jack performed 10 actions out of
which in 7 actions, he acted as a leader (i.e. more than ψ users followed in short time), then conf(Jack) = 7/10
Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/