Authority, Trust and Influence The Complex Network of Social Media Bill Rand

Authority, Trust and Influence

The Complex Network of Social MediaBill Rand

What does Facebook have to

do with Complexity?

3

Social Media is an archetypal Complex

System.–Social Networks are where individuals meet

groups

–Social Science meets Technology

–Psychology meets Information Processing

–Design meets Engineering

–Sociology meets Innovation

–All Driven by Constant Change, Permanent Evolution

Complex Systems are about individual actionsresulting in emergent patterns, and how thosepatterns feedback to affect individual actions.

Complex Systems are about individual actionsresulting in emergent patterns, and how thosepatterns feedback to affect individual actions.

Questions• Where does authority come from in Social Media?

• Authority is not just what you know but who you know and who knows you....

• Who are the most influential individuals in social media?

• It may not just be those who are the most popular...

• How is trust earned in social media?

• We can design new social network mechanisms that increase trust in social networks....

Authorityjoint work with Shanchan Wu, Tamer Elsayed and Louiqa

RaschidSupported by NSF Awards CMMI 0753124, IIS 0960963, &

IIS1018361

6

Motivation• Example

• Gap logo fiasco in the Fall of 2010 (10/4).

• Gap introduced a new logo, changing the iconic logo it had for 20 years almost overnight.

• There was an immediate outpouring of negative comments

• Gap quickly reverted to the old logo (10/12).

• Goal

• Identify which author (blog channel) is likely to become an authority on Topic X (e.g., Gap Logo Redesign) in the near future.

7

Blog Channels• A blog channel is an event stream of posts or blog

entries.

• There are links between different blog channels, and links pointing outside of the blogosphere.

p10

p8

p7

p3

p6p4

p9

p5

p1

p2

b1

b2

b3

b4

8

Problem Definition• Future Author(ity)

Prediction Problem (FAPP)

• Given a focal query post on Topic X, what other blog channels (authors) are likely to post on that topic in the (near) future?

• The goal is to identify up to K channels that will publish a post on Topic X in the near future

X

b4

b3

b2

b1 Y Z W

A

B

C

D

E

X?

X?

X?

X?

9

Features

•Content Features

•Post Similarity

•Profile Similarity

•Network Features

•Blog-Blog Links

•External (non-blog) links

10

Prediction Methods• PROF (Profile Based Prediction)

• PROF retrieve the Top K blog channels ranked by the their similarity scores to a focal query post q.

• VOTE (Voting-Based Prediction)

• VOTE chooses the top K channels using the aggregate similarity score of all historical posts in a channel b with a focal query post q.

• RSVMP (Ranking SVM Based Prediction)

• Content Features

• post-post similarity, post-profile similarity, profile-profile similarity, consistency scores, named-entities

• Network Features

• links, external links

11

Dataset and Metrics• Original Data

• From Spinn3r, 142 GB, Two months; 44 million blog posts.

• Data for experiments

• English blog channels only.

• Blog channels containing between 30 and 120 posts.

• Metric: mean average precision (MAP)

Time range 07/30/08–10/1/08

Number of blog posts 2,185,810

Number of blog channels 42,005

Avg number of posts per blog channel 52.04

12

Results

Training data: 30 daysTest data (for ground truth): 10 days

13

Diffusion Stage

cRatio 0-0.2 0.2-0.4 0.4-0.5 0.5-0.6 0.6-0.8 0.8-1.0

VOTE 0.090 0.167 0.234 0.222 0.137 0.062

PROF 0.144 0.193 0.257 0.244 0.151 0.056

RSVMP 0.188 0.225 0.288 0.262 0.170 0.070

The impact of cRatio on the “High Consistency” test dataset.

cRatio 0-0.2 0.2-0.4 0.4-0.5 0.5-0.6 0.6-0.8 0.8-1.0

VOTE 0.091 0.233 0.498 0.437 0.228 0.110

PROF 0.172 0.285 0.578 0.487 0.262 0.214

RSVMP 0.205 0.309 0.605 0.525 0.314 0.235

The impact of cRatio on the entire test dataset.

14

Impact of Author Distribution

High V/AC: query posts havingV/AC [1.5, +∞), calculatedin the 10-day test dataset

MAP values are highest for high V/AC ratio and consistent blog channels

15

Authority

•To make good predictions about who is likely to become an authority on a topic, it is important to take in to account network structure as well as content.

•This is a limited definition of authority, where authority is defined to be any posting on a topic, but how influential is this blogger?

Influencejoint work with Forrest Stonedahl and Uri Wilensky

Supported by NSF Award IIS-0713619

Who are the most influential individuals in

social networks?•How does network structure affect influence?

•What is the value of an individual in a network?

•If we can simulate a diffusion process at the micro-level then we can answer these questions.

NPV of a Network•Calculating the Net Present Value of a Network

–Assume a manager can seed an arbitrary fraction of a network and she seeds the most highly influential individuals

–Discount rate of .1 (i.e., $1 tomorrow is worth $.90 today)

–Then we just add up when people adopt

Who should you seed?

•Which individuals will allow you to reach the widest audience as soon as possible?

•Standard Rule-of-Thumb is to seed those with the highest number of connections

•Alternative Strategies

•Seed the people whose friends do not talk to each other, spread the message widely (low clustering coefficient)

•Seed the people who are the closest to everyone else in the network, centralize your message (low average path length)

How many to Seed?

•Seeding more people means the message spreads quicker, but

•Seeding more people costs more, and

•At a certain point you start seeding people who would have adopted anyway because of their friends

•So how many people should we seed?

Experimental SetupFive networksTwo scenarios: “medium” & “high” virality

30 genetic algorithm searches to determine the best seeding strategies

random lattice small-world

preferential attachment

twitter

Best Primary Strategies

Optimal Twitter Seeds

Alumni Network

InfluencePeople with lots of friends know other

people with lots of friends which constrains social contagion.

The most influential people have lots of friends but their friends don’t know each other.

But this assumes that all individuals trust each other equally, what happens when trust varies over a network?

27

Trustjoint work with Hossam Sharara and Lise Getoor

Supported by NSF Award IIS-0746930 and IIS-1018361

Motivation

WOW… I’ll send it over to

everyone

WOW… I’ll send it over to

everyone

Online Bookstore(Invite a friend and get

10% off your next purchase)

Online Bookstore(Invite a friend and get

10% off your next purchase)

MovieRental.com(Refer a friend and get

$10 off your next rental)

MovieRental.com(Refer a friend and get

$10 off your next rental)

Bob and Mary will definitely be

interested. However, I think Ann is not much

into movies

Bob and Mary will definitely be

interested. However, I think Ann is not much

into movies

Ann

Bob

Janet

Mary

John

Objectives

Capture the diversity in user preferences for different products

Model the change in influence probabilities across multiple campaigns

Design a viral marketing strategy that takes changes in trust based on these factors into account

Dataset Social Network (user-user following

links)

• 11,942 users

• 1.3M follow edges

Digg Network (user-story digging links)

• 48,554 news stories

• 1.9M digg edges

• 6 months (Jul 2010 – Dec 2010)

Differential Adaptive Diffusion

The influence probability between two peers (u,v) for product category c can be re-written as

Confidence of user vin u at campaign i

Preference of user vin product type c

The confidence weights are updated at the end of each campaign

Confidence of user vin u at campaign i

Experimental Evaluation

Evaluate the model performance in predicting future adoptions

We use the first four months in Digg.com dataset for learning the influence probabilities, and the last two months for testing

Results

The Adaptive model, taking both the diffusion dynamics and the users heterogeneity into account, yields better performance

Adaptive Rewards Successful recommendations are awarded (α

x r) units, while failed ones are penalized ((1-α) x r) units

α conservation parameter

Most existing viral marketing strategies assume α=1 (no reason for the user to be selective)

The penalty term helps maintain the average overall confidence level between different peers

Experimental Setup

An agent-based model simulates the behavior of customers in different settings

When an agent adopts the product, it makes a probabilistic decision to send a recommendation based on its knowledge about the peers’ preferences

The objective of each agent is to maximize its expected reward according to the existing strategy

Two sets of experiments

• Fully observable: The agents are allowed to directly observe the preferences of their peers

• Learning preferences: The agents have to learn the peer’s preferences based on their response to previous recommendations

Fully Observable

• Intermediate values for α (e.g. α = 0.5) consistently maintains high adoption rates and high overall trust over large number of marketing campaigns

Learning Preferences

•Allowing agents to learn the preferences accounts for both the product preference as well as the confidence level

Trust

•We can make better predictions about adoption if we take in to account heterogeneous preferences and dynamic trust.

•We can create better mechanisms that encourage more trust within social networks.

42

43

Authority

Authority

Influence

Influence

TrustTrust

Any [email protected]

www.rhsmith.umd.edu/ccb/bit.ly/ccbssrn

Best Solution vs. “pure degree” seeding

Case Study: Digg.com

Social news website

Users “submit” stories in differenttopics, which can then be “digged”by other users

Users can “follow” other users to get their submissions and diggs on their homepage

Following links define the social network

User submissions serve as proxy of user preferences for different topics

User diggs are analogous to product adoptions

Adaptive Viral Marketing

User recommendations are most effective when recommended to the right subset of friends

• Highly selective behavior Limited exposure

• Spamming lower confidence levels, limited returns

What is the appropriate mechanism for maximizing both the product spread and adoption?

Kernel Functions Linear Kernel

• If v adopts the product

Each peer u who recommended the product to v gets a credit proportional to the time elapsed (last recommender max. credit)

• If v doesn’t adopt the product

Each peer u who recommended the product to v gets penalized proportional to the time elapsed (last recommender max. penalty)

Experimental Setup

Two sets of experiments• Fully observable: The agents are allowed to directly observe the

preferences of their peers

• Learning preferences: The agents have to learn the peer’s preferences based on their response to previous recommendations

Simulate the diffusion of 500 campaigns for products from 5 different categories

We use a linear kernel for adjusting the confidence levels between peers after each campaign

Effect of Spammers

To test the robustness of our proposed method, we inserted spamming agents in the network

A spamming agent forwards all product recommendation for all its peers, regardless of their preferences

We set (α = 0.5) for all the other agents, and vary the number of seeded spammers

Effect of Spammers

•The network adapts to the presence of spammers (dropping their confidence levels), and continues to maintain adoption levels through trusted links

Documents

Authority, Trust and Influence The Complex Network of Social Media Bill Rand