Upload
spike
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Maximizing the Spread of Influence through a Social Network. David Kempe , Jon Kleinberg, Eva Tardos Cornell University KDD 2003. Social network and spread of influence. Social network spreads INFLUENCE among its members Opinions, ideas, information … - PowerPoint PPT Presentation
Citation preview
Maximizing the Spread of Influence through a Social Network
David Kempe, Jon Kleinberg, Eva Tardos
Cornell UniversityKDD 2003
Social network and spread of influence
• Social network spreadsINFLUENCE among its members– Opinions, ideas, information …
• “Word-of-mouth” effect in Viral Marketing
Motivating scenarios
1. Adoption of a new drug by doctors and patients
How to reach many patients?
2. Adoption of a new book by profs and students
How to reach many students?
3. Bloggers blogging and publishing weblogs
Follow which blogger to get the most information?
4. Battle of Water Sensor Networks
How to find the optimize sensor placement?
Problem setting
• Given
– A limited budget B for initial advertising
– Influence estimates between individuals
• Goal
– Trigger a large cascade of influence
• Question
– Which set of individuals should we target?
What do we have in this paper?
• Form models of influence in social networks
• Obtain data about particular network(inter-personal influence estimating)
• Devise algorithm to maximize spread of
influence
Models of influence• First mathematical models
– [Schelling ‘70/’78], [Granovetter ‘78]• Large body of subsequent work
– [Rogers ‘95], [Valente ‘95], [Wasserman/Faust ‘94]• Two basic classes of diffusion models: threshold and cascade• General operational view:
– A social network is a directed graph, each person (individual) is a node
– Nodes start either active or inactive– An active node may trigger activation of neighboring nodes– Monotonicity assumption
Linear threshold model
• A node has a random threshold
• A node is influenced by each neighbor
according to a weight such that :
• A node becomes active when at least
(weighted) fraction of its neighbors are active:
Example
Inactive Node
Active Node
Threshold
Active neighbors
vw 0.5
0.30.2
0.5
0.10.4
0.3 0.2
0.6
0.2
Stop!
U
X
Independent cascade model
• When node becomes active, it has a single chance of activating each currently inactive neighbor
• The activation attempt succeeds with independent probability
Example
vw 0.5
0.3 0.20.5
0.10.4
0.3 0.2
0.6
0.2Inactive Node
Active Node
Newly active node
Successful attempt
Unsuccessfulattempt
Stop!
UX
Influence maximization problem
• Influence of a node set :
– Expected number of active nodes at the end, if set is the initial active set
• Problem:
– Given a parameter (budget), find a -node set to maximize – Constrained optimization problem with as the objective
function
Properties of
• Non-negative (obviously)
• Monotone:
• Submodular:
– Let be a finite set
– A set function is submodular iff
Bad news
• For a submodular function , if only takes non-negative
values, and is monotone, finding a -element set for
which is maximized in an NP-hard optimization problem.
• It is NP-hard to determine the optimum for influence
maximization for both independent cascade model and
linear threshold model.
Good news
• We can use Greedy algorithm– Start with an empty set
– For iterations:• Add node to that maximizes
• How good (bad) is it?– Theorem: The greedy algorithm is a approx.
– The resulting set activated at least of the number of nodes that any size- set could activate
Greedy algorithm
Other heuristics to find
• High-degree
– Picks nodes with highest node degree
• Distance centrality
– Picks nodes with lowest average distance to other nodes in
the network
• Random
– Randomly pick nodes
Experiment setup
• Co-authorship network from physics section of arXiv.org
• A node is an author
• A link is a co-authored paper ( links)
• LT model: The edge has weight
• IC model:
– The edge has prob.
Experiment result on IC model
• Result on LT model is similar• Not sensitive to different algorithms at high
Cost-effective Outbreak Detection in Networks
Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance
Carnegie Mellon UniversityKDD 2007
Original Greedy
Inefficient!!!
15,000 nodes takesa few days to complete
Complexity Redundant!!!
Submodularity property
• Recall:
• When adding a vertex v to seed set S, the gain of adding v is larger if S is smaller
• Therefore: a large number of nodes to not need to be re-evaluate
CELF algorithm
If then discard
700 times faster thanthe original greedy!!!
Efficient Influence Maximization in Social Networks
Wei Chen, Yajun Wang, Siyu Yang
Microsoft Research, Tsinghua UniversityKDD 2009
Improved greedy
• Construct a graph
• Obtain by removing edges not for propagation from with
prob.
• Use DFS/BFS to find out the set of vertices reachable from
in
• Also obtain
• Remove overlapping elements
Improved greedy
15-34% faster thanthe original greedy!!!
Mix with CELF
• Cons– CELF must consider all vertices to be added in the
first round, but then we can decreased in future rounds
– Improved greedy must build G’ for R times
• Mix– First vertex: use Improved greedy– Other vertices: use CELF