Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence through a Social Network

David Kempe, Jon Kleinberg, Eva Tardos

Cornell UniversityKDD 2003

Social network and spread of influence

• Social network spreadsINFLUENCE among its members– Opinions, ideas, information …

• “Word-of-mouth” effect in Viral Marketing

Motivating scenarios

1. Adoption of a new drug by doctors and patients

How to reach many patients?

2. Adoption of a new book by profs and students

How to reach many students?

3. Bloggers blogging and publishing weblogs

Follow which blogger to get the most information?

4. Battle of Water Sensor Networks

How to find the optimize sensor placement?

Problem setting

• Given

– A limited budget B for initial advertising

– Influence estimates between individuals

• Goal

– Trigger a large cascade of influence

• Question

– Which set of individuals should we target?

What do we have in this paper?

• Form models of influence in social networks

• Obtain data about particular network(inter-personal influence estimating)

• Devise algorithm to maximize spread of

influence

Models of influence• First mathematical models

– [Schelling ‘70/’78], [Granovetter ‘78]• Large body of subsequent work

– [Rogers ‘95], [Valente ‘95], [Wasserman/Faust ‘94]• Two basic classes of diffusion models: threshold and cascade• General operational view:

– A social network is a directed graph, each person (individual) is a node

– Nodes start either active or inactive– An active node may trigger activation of neighboring nodes– Monotonicity assumption

Linear threshold model

• A node has a random threshold

• A node is influenced by each neighbor

according to a weight such that :

• A node becomes active when at least

(weighted) fraction of its neighbors are active:

Example

Inactive Node

Active Node

Threshold

Active neighbors

vw 0.5

0.30.2

0.5

0.10.4

0.3 0.2

0.6

0.2

Stop!

U

X

Independent cascade model

• When node becomes active, it has a single chance of activating each currently inactive neighbor

• The activation attempt succeeds with independent probability

Example

vw 0.5

0.3 0.20.5

0.10.4

0.3 0.2

0.6

0.2Inactive Node

Active Node

Newly active node

Successful attempt

Unsuccessfulattempt

Stop!

UX

Influence maximization problem

• Influence of a node set :

– Expected number of active nodes at the end, if set is the initial active set

• Problem:

– Given a parameter (budget), find a -node set to maximize – Constrained optimization problem with as the objective

function

Properties of

• Non-negative (obviously)

• Monotone:

• Submodular:

– Let be a finite set

– A set function is submodular iff

Bad news

• For a submodular function , if only takes non-negative

values, and is monotone, finding a -element set for

which is maximized in an NP-hard optimization problem.

• It is NP-hard to determine the optimum for influence

maximization for both independent cascade model and

linear threshold model.

Good news

• We can use Greedy algorithm– Start with an empty set

– For iterations:• Add node to that maximizes

• How good (bad) is it?– Theorem: The greedy algorithm is a approx.

– The resulting set activated at least of the number of nodes that any size- set could activate

Greedy algorithm

Other heuristics to find

• High-degree

– Picks nodes with highest node degree

• Distance centrality

– Picks nodes with lowest average distance to other nodes in

the network

• Random

– Randomly pick nodes

Experiment setup

• Co-authorship network from physics section of arXiv.org

• A node is an author

• A link is a co-authored paper ( links)

• LT model: The edge has weight

• IC model:

– The edge has prob.

Experiment result on IC model

• Result on LT model is similar• Not sensitive to different algorithms at high

Cost-effective Outbreak Detection in Networks

Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance

Carnegie Mellon UniversityKDD 2007

Original Greedy

Inefficient!!!

15,000 nodes takesa few days to complete

Complexity Redundant!!!

Submodularity property

• Recall:

• When adding a vertex v to seed set S, the gain of adding v is larger if S is smaller

• Therefore: a large number of nodes to not need to be re-evaluate

CELF algorithm

If then discard

700 times faster thanthe original greedy!!!

Efficient Influence Maximization in Social Networks

Wei Chen, Yajun Wang, Siyu Yang

Microsoft Research, Tsinghua UniversityKDD 2009

Improved greedy

• Construct a graph

• Obtain by removing edges not for propagation from with

prob.

• Use DFS/BFS to find out the set of vertices reachable from

in

• Also obtain

• Remove overlapping elements

Improved greedy

15-34% faster thanthe original greedy!!!

Mix with CELF

• Cons– CELF must consider all vertices to be added in the

first round, but then we can decreased in future rounds

– Improved greedy must build G’ for R times

• Mix– First vertex: use Improved greedy– Other vertices: use CELF

Documents

Maximizing the Spread of Influence through a Social Network