Maximizing the Spread of Influence through a Social Network

Preview:

DESCRIPTION

Maximizing the Spread of Influence through a Social Network. By David Kempe , Jon Kleinberg, Eva Tardos Report by Joe Abrams. Social Networks. Infectious disease networks. Viral Marketing. Viral Marketing. Example:Hotmail Included service’s URL in every email sent by users - PowerPoint PPT Presentation

Citation preview

Maximizing the Spread Maximizing the Spread of Influence through a of Influence through a

SocialSocialNetworkNetwork

By David Kempe, Jon By David Kempe, Jon Kleinberg, Eva TardosKleinberg, Eva Tardos

Report by Joe AbramsReport by Joe Abrams

Social NetworksSocial Networks

Infectious disease networksInfectious disease networks

Viral MarketingViral Marketing

Viral MarketingViral Marketing

• Example:Example: HotmailHotmail

• Included service’s URL in every email sent Included service’s URL in every email sent by usersby users

• Grew from zero to 12 million users in 18 Grew from zero to 12 million users in 18 months with small advertising budgetmonths with small advertising budget

Domingos and Richardson Domingos and Richardson (2001, 2002)(2001, 2002)

• Introduction to maximization of Introduction to maximization of influence over social networksinfluence over social networks

• Intrinsic Value vs. Network ValueIntrinsic Value vs. Network Value

• Expected Lift in Profit (ELP)Expected Lift in Profit (ELP)

• Epinions, “web of trust”, 75,000 Epinions, “web of trust”, 75,000 users and 500,000 edgesusers and 500,000 edges

Domingos and Richardson Domingos and Richardson (2001, 2002)(2001, 2002)

• Viral marketing (using greedy hill-Viral marketing (using greedy hill-climbing strategy) worked very well climbing strategy) worked very well compared with direct marketingcompared with direct marketing

• Robust (69% of total lift knowing only Robust (69% of total lift knowing only 5% of edges)5% of edges)

Diffusion Model: Linear Diffusion Model: Linear Threshold ModelThreshold Model

• Each node (consumer) influenced by Each node (consumer) influenced by set of neighbors; has threshold set of neighbors; has threshold ΘΘ from uniform distribution [0,1]from uniform distribution [0,1]

• When combined influence reaches When combined influence reaches threshold, node becomes “active”threshold, node becomes “active”

• Active node now can influence its Active node now can influence its neighborsneighbors

• Weighted edgesWeighted edges

Diffusion Model: Linear Diffusion Model: Linear Threshold ModelThreshold Model

Diffusion Model: Diffusion Model: Independent Cascade ModelIndependent Cascade Model

• Each active node has a probability Each active node has a probability pp of activating a neighborof activating a neighbor

• At time At time tt+1, all newly activated +1, all newly activated nodes try to activate their neighborsnodes try to activate their neighbors

• Only one attempt for per node on Only one attempt for per node on targettarget

• Akin to turn-based strategy game?Akin to turn-based strategy game?

Influence MaximizationInfluence Maximization

• Using greedy hill-climbing strategy, Using greedy hill-climbing strategy, can approximate optimum to within a can approximate optimum to within a factor of (1 – 1/e – factor of (1 – 1/e – εε), or ~63%), or ~63%

• Proven using theories of submodular Proven using theories of submodular functions (diminishing returns)functions (diminishing returns)

• Applies to both diffusion modelsApplies to both diffusion models

Testing on network dataTesting on network data

• Co-authorship networkCo-authorship network

• High-energy physics theory section High-energy physics theory section of of www.arxiv.org

• 10,748 nodes (authors) and ~53,000 10,748 nodes (authors) and ~53,000 edgesedges

• Multiple co-authored papers listed as Multiple co-authored papers listed as parallel edges (greater weight)parallel edges (greater weight)

Testing on network dataTesting on network data

• Linear Threshold: influence weighed Linear Threshold: influence weighed by # of parallel lines, inversely by # of parallel lines, inversely weighed by degree of target node: w weighed by degree of target node: w = c= cu,v u,v /d/dvv

• Independent Cascade: Independent Cascade: pp set at 1% set at 1% and 10%; total probability for and 10%; total probability for u u vv is is

1 – (1 – 1 – (1 – pp)^c)^cu,vu,v

• Weighted Cascade: Weighted Cascade: pp = 1/ d = 1/ dvv

AlgorithmsAlgorithms

• Greedy hill-climbingGreedy hill-climbing

• High degree: nodes with greatest High degree: nodes with greatest number of edgesnumber of edges

• Distance centrality: lowest average Distance centrality: lowest average distance with other nodesdistance with other nodes

• RandomRandom

AlgorithmsAlgorithms

Results: Linear Threshold Results: Linear Threshold ModelModel

Greedy: ~40% better than central, ~18% better than high degree

Results: Weighted Cascade Results: Weighted Cascade ModelModel

Results: Independent Results: Independent Cascade, Cascade, pp = 1% = 1%

Results: Independent Results: Independent Cascade, Cascade, pp = 10% = 10%

Advantages of Random Advantages of Random SelectionSelection

Generalized modelsGeneralized models

• Generalized Linear Threshold: for node Generalized Linear Threshold: for node vv, influence of neighbors not necessarily , influence of neighbors not necessarily sum of individual influencessum of individual influences

• Generalized Independent Cascade: for Generalized Independent Cascade: for node node vv, probability , probability pp depends on set of depends on set of vv’s neighbors that have previously tried ’s neighbors that have previously tried to activate to activate vv

• Models computationally equivalent, Models computationally equivalent, impossible to guarantee approximationimpossible to guarantee approximation

Non-Progressive Threshold Non-Progressive Threshold ModelModel• Active nodes can become inactiveActive nodes can become inactive

• Similar concept: at each time Similar concept: at each time tt, whether , whether or not or not vv becomes/stays active depends becomes/stays active depends on if influence meets thresholdon if influence meets threshold

• Can “intervene” at different times; need Can “intervene” at different times; need not perform all interventions at not perform all interventions at tt = 0 = 0

• Answer to progressive model with graph Answer to progressive model with graph G equivalent to non-progressive model G equivalent to non-progressive model with layered graph Gwith layered graph Gττ

General Marketing General Marketing StrategiesStrategies• Can divide up total budget Can divide up total budget κκ into into

equal increments of size equal increments of size δδ

• For greedy hill-climbing strategy, can For greedy hill-climbing strategy, can guarantee performance within factor guarantee performance within factor of of

1 – e^[-(1 – e^[-(κκ **γγ)/()/(κκ ++ δδ **nn)])]

• As As δδ decreases relative to decreases relative to κκ, result , result approaches 1 – eapproaches 1 – e-1-1 = 63% = 63%

Strengths of paperStrengths of paper

• Showed results in two complementary Showed results in two complementary fashions: theoretical models and test fashions: theoretical models and test results using real datasetresults using real dataset

• Demonstrated that greedy hill-climbing Demonstrated that greedy hill-climbing strategy could guarantee results within strategy could guarantee results within 63% of optimum63% of optimum

• Used specific and generalized versions Used specific and generalized versions of two different diffusion modelsof two different diffusion models

Weaknesses of paperWeaknesses of paper

• Doesn’t fully explain methodology of Doesn’t fully explain methodology of greedy hill-climbing strategygreedy hill-climbing strategy

• Lots of work not shown – simply refers Lots of work not shown – simply refers to work done in other papersto work done in other papers

• Threshold value uniformly distributed?Threshold value uniformly distributed?

• Influence inversely weighted by Influence inversely weighted by degree of target?degree of target?

Questions?Questions?

Recommended