Upload
madison-mills
View
215
Download
1
Embed Size (px)
Citation preview
Authority, Trust and Influence
The Complex Network of Social MediaBill Rand
What does Facebook have to
do with Complexity?
3
Social Media is an archetypal Complex
System.–Social Networks are where individuals meet
groups
–Social Science meets Technology
–Psychology meets Information Processing
–Design meets Engineering
–Sociology meets Innovation
–All Driven by Constant Change, Permanent Evolution
Complex Systems are about individual actionsresulting in emergent patterns, and how thosepatterns feedback to affect individual actions.
Complex Systems are about individual actionsresulting in emergent patterns, and how thosepatterns feedback to affect individual actions.
Questions• Where does authority come from in Social Media?
• Authority is not just what you know but who you know and who knows you....
• Who are the most influential individuals in social media?
• It may not just be those who are the most popular...
• How is trust earned in social media?
• We can design new social network mechanisms that increase trust in social networks....
Authorityjoint work with Shanchan Wu, Tamer Elsayed and Louiqa
RaschidSupported by NSF Awards CMMI 0753124, IIS 0960963, &
IIS1018361
6
Motivation• Example
• Gap logo fiasco in the Fall of 2010 (10/4).
• Gap introduced a new logo, changing the iconic logo it had for 20 years almost overnight.
• There was an immediate outpouring of negative comments
• Gap quickly reverted to the old logo (10/12).
• Goal
• Identify which author (blog channel) is likely to become an authority on Topic X (e.g., Gap Logo Redesign) in the near future.
7
Blog Channels• A blog channel is an event stream of posts or blog
entries.
• There are links between different blog channels, and links pointing outside of the blogosphere.
p10
p8
p7
p3
p6p4
p9
p5
p1
p2
b1
b2
b3
b4
8
Problem Definition• Future Author(ity)
Prediction Problem (FAPP)
• Given a focal query post on Topic X, what other blog channels (authors) are likely to post on that topic in the (near) future?
• The goal is to identify up to K channels that will publish a post on Topic X in the near future
X
b4
b3
b2
b1 Y Z W
A
B
C
D
E
X?
X?
X?
X?
9
Features
•Content Features
•Post Similarity
•Profile Similarity
•Network Features
•Blog-Blog Links
•External (non-blog) links
10
Prediction Methods• PROF (Profile Based Prediction)
• PROF retrieve the Top K blog channels ranked by the their similarity scores to a focal query post q.
• VOTE (Voting-Based Prediction)
• VOTE chooses the top K channels using the aggregate similarity score of all historical posts in a channel b with a focal query post q.
• RSVMP (Ranking SVM Based Prediction)
• Content Features
• post-post similarity, post-profile similarity, profile-profile similarity, consistency scores, named-entities
• Network Features
• links, external links
11
Dataset and Metrics• Original Data
• From Spinn3r, 142 GB, Two months; 44 million blog posts.
• Data for experiments
• English blog channels only.
• Blog channels containing between 30 and 120 posts.
• Metric: mean average precision (MAP)
Time range 07/30/08–10/1/08
Number of blog posts 2,185,810
Number of blog channels 42,005
Avg number of posts per blog channel 52.04
12
Results
Training data: 30 daysTest data (for ground truth): 10 days
13
Diffusion Stage
cRatio 0-0.2 0.2-0.4 0.4-0.5 0.5-0.6 0.6-0.8 0.8-1.0
VOTE 0.090 0.167 0.234 0.222 0.137 0.062
PROF 0.144 0.193 0.257 0.244 0.151 0.056
RSVMP 0.188 0.225 0.288 0.262 0.170 0.070
The impact of cRatio on the “High Consistency” test dataset.
cRatio 0-0.2 0.2-0.4 0.4-0.5 0.5-0.6 0.6-0.8 0.8-1.0
VOTE 0.091 0.233 0.498 0.437 0.228 0.110
PROF 0.172 0.285 0.578 0.487 0.262 0.214
RSVMP 0.205 0.309 0.605 0.525 0.314 0.235
The impact of cRatio on the entire test dataset.
14
Impact of Author Distribution
High V/AC: query posts havingV/AC [1.5, +∞), calculatedin the 10-day test dataset
MAP values are highest for high V/AC ratio and consistent blog channels
15
Authority
•To make good predictions about who is likely to become an authority on a topic, it is important to take in to account network structure as well as content.
•This is a limited definition of authority, where authority is defined to be any posting on a topic, but how influential is this blogger?
Influencejoint work with Forrest Stonedahl and Uri Wilensky
Supported by NSF Award IIS-0713619
Who are the most influential individuals in
social networks?•How does network structure affect influence?
•What is the value of an individual in a network?
•If we can simulate a diffusion process at the micro-level then we can answer these questions.
NPV of a Network•Calculating the Net Present Value of a Network
–Assume a manager can seed an arbitrary fraction of a network and she seeds the most highly influential individuals
–Discount rate of .1 (i.e., $1 tomorrow is worth $.90 today)
–Then we just add up when people adopt
Who should you seed?
•Which individuals will allow you to reach the widest audience as soon as possible?
•Standard Rule-of-Thumb is to seed those with the highest number of connections
•Alternative Strategies
•Seed the people whose friends do not talk to each other, spread the message widely (low clustering coefficient)
•Seed the people who are the closest to everyone else in the network, centralize your message (low average path length)
How many to Seed?
•Seeding more people means the message spreads quicker, but
•Seeding more people costs more, and
•At a certain point you start seeding people who would have adopted anyway because of their friends
•So how many people should we seed?
Experimental SetupFive networksTwo scenarios: “medium” & “high” virality
30 genetic algorithm searches to determine the best seeding strategies
random lattice small-world
preferential attachment
Best Primary Strategies
Optimal Twitter Seeds
Alumni Network
InfluencePeople with lots of friends know other
people with lots of friends which constrains social contagion.
The most influential people have lots of friends but their friends don’t know each other.
But this assumes that all individuals trust each other equally, what happens when trust varies over a network?
27
Trustjoint work with Hossam Sharara and Lise Getoor
Supported by NSF Award IIS-0746930 and IIS-1018361
Motivation
WOW… I’ll send it over to
everyone
WOW… I’ll send it over to
everyone
Online Bookstore(Invite a friend and get
10% off your next purchase)
Online Bookstore(Invite a friend and get
10% off your next purchase)
MovieRental.com(Refer a friend and get
$10 off your next rental)
MovieRental.com(Refer a friend and get
$10 off your next rental)
Bob and Mary will definitely be
interested. However, I think Ann is not much
into movies
Bob and Mary will definitely be
interested. However, I think Ann is not much
into movies
Ann
Bob
Janet
Mary
John
Objectives
Capture the diversity in user preferences for different products
Model the change in influence probabilities across multiple campaigns
Design a viral marketing strategy that takes changes in trust based on these factors into account
Dataset Social Network (user-user following
links)
• 11,942 users
• 1.3M follow edges
Digg Network (user-story digging links)
• 48,554 news stories
• 1.9M digg edges
• 6 months (Jul 2010 – Dec 2010)
Differential Adaptive Diffusion
The influence probability between two peers (u,v) for product category c can be re-written as
Confidence of user vin u at campaign i
Preference of user vin product type c
The confidence weights are updated at the end of each campaign
Confidence of user vin u at campaign i
Experimental Evaluation
Evaluate the model performance in predicting future adoptions
We use the first four months in Digg.com dataset for learning the influence probabilities, and the last two months for testing
Results
The Adaptive model, taking both the diffusion dynamics and the users heterogeneity into account, yields better performance
Adaptive Rewards Successful recommendations are awarded (α
x r) units, while failed ones are penalized ((1-α) x r) units
α conservation parameter
Most existing viral marketing strategies assume α=1 (no reason for the user to be selective)
The penalty term helps maintain the average overall confidence level between different peers
Experimental Setup
An agent-based model simulates the behavior of customers in different settings
When an agent adopts the product, it makes a probabilistic decision to send a recommendation based on its knowledge about the peers’ preferences
The objective of each agent is to maximize its expected reward according to the existing strategy
Two sets of experiments
• Fully observable: The agents are allowed to directly observe the preferences of their peers
• Learning preferences: The agents have to learn the peer’s preferences based on their response to previous recommendations
Fully Observable
• Intermediate values for α (e.g. α = 0.5) consistently maintains high adoption rates and high overall trust over large number of marketing campaigns
Learning Preferences
•Allowing agents to learn the preferences accounts for both the product preference as well as the confidence level
Trust
•We can make better predictions about adoption if we take in to account heterogeneous preferences and dynamic trust.
•We can create better mechanisms that encourage more trust within social networks.
42
43
Authority
Authority
Influence
Influence
TrustTrust
www.rhsmith.umd.edu/ccb/bit.ly/ccbssrn
Best Solution vs. “pure degree” seeding
Case Study: Digg.com
Social news website
Users “submit” stories in differenttopics, which can then be “digged”by other users
Users can “follow” other users to get their submissions and diggs on their homepage
Following links define the social network
User submissions serve as proxy of user preferences for different topics
User diggs are analogous to product adoptions
Adaptive Viral Marketing
User recommendations are most effective when recommended to the right subset of friends
• Highly selective behavior Limited exposure
• Spamming lower confidence levels, limited returns
What is the appropriate mechanism for maximizing both the product spread and adoption?
Kernel Functions Linear Kernel
• If v adopts the product
Each peer u who recommended the product to v gets a credit proportional to the time elapsed (last recommender max. credit)
• If v doesn’t adopt the product
Each peer u who recommended the product to v gets penalized proportional to the time elapsed (last recommender max. penalty)
Experimental Setup
Two sets of experiments• Fully observable: The agents are allowed to directly observe the
preferences of their peers
• Learning preferences: The agents have to learn the peer’s preferences based on their response to previous recommendations
Simulate the diffusion of 500 campaigns for products from 5 different categories
We use a linear kernel for adjusting the confidence levels between peers after each campaign
Effect of Spammers
To test the robustness of our proposed method, we inserted spamming agents in the network
A spamming agent forwards all product recommendation for all its peers, regardless of their preferences
We set (α = 0.5) for all the other agents, and vary the number of seeded spammers
Effect of Spammers
•The network adapts to the presence of spammers (dropping their confidence levels), and continues to maintain adoption levels through trusted links