Efficient Identification of Starters and Followers in Social Media

Michael Mathioudakis, Nick Koudas

• Formalize a definition of “starters” and “followers” in blogs

• Random sampling approaches to achieve significant efficiency while identifying “starters” and “followers”

Starters vs Followers• Starter: a blogger who generates posts that

others link to over a period of time• Follower: a blogger that links to other blog

posts over a period of time

NotationNotation Definition

P The set of all posts in the query result set

B The set of blogs in a query result set P

The set of posts in P coming from blog B

L The set of all links between posts in P

G A graph used as an abstract representation of P and the links L

V The node set of G

E The edge set of G

A query at time T

Calculating Starters and Followers

• In degree of node

• Out degree of node

• Degree of node

Brute Force

• Query the database for all posts• Calculate the degree of every node and sum

• Why not?– Retrieving all posts can be costly– Lots of overhead

Deterministic Early-Stopping Conditions

• = enumerated subset of

• is the set of k starters

• If , then exists a pair ,with and such that

• Use linear equalities to determine feasibility

Linear Inequalities

Linear Inequality Issues

• Result?– Large domains– Easily feasible– Traverse almost all edges before stopping

• Solution?– Relax requirements, use probabilistic guarantees

Probabilistic Early-Stopping Conditions

• Trade efficiency with accuracy

• Still aim to return starters

• Assume edges chosen uniformly at random

Probabilities

• for all pairs of nodes

• Pr < 10% return the result set

• How do you determine the bound for the probability?

Hoeffding’s Inequality

• Provides a lower bound

• Lower bound =

• Uniform sample should capture any skew

• Starters appear after few sampled edges

Random Sampling Techniques• Out-degrees among nodes is known• Maximum out-degree of a node is known• Sampling nodes uniformly at random• Random walk approach

Out-Degrees Known

Out-Degrees Known Issues

• Knowing out-degree = strong assumption• Requirements– Retrieve all posts in query– Extract all links

• Solution?– Weaker assumption on distribution of edges

Maximum Out-Degree Known

Maximum Out-Degree Issues

• Blog graphs typically heavy-tailed

• Probability at one iteration =

• Expected iterations =

Sampling Nodes Uniformly at Random

Sampling Nodes Uniformly at Random Issues

• Not sampled uniformly at random

• Only unbiased estimates of edges from one node to another

• Can’t handle heavy-tailed distributions

• Leads to poor accuracy

Random Walk Approach

• 2 step approach

– Obtain a new graph from the input graph

– Obtain a Markov chain

Step 1 – Obtain New Graph

• Create a new graph H(V, E) from input graph – Remove direction of edges– Add self-loops– Add edges between nodes returned in order

Step 2 – Create Markov Chain

• Markov Chain = MC(K, T)– K = the possible stats (nodes) – T = possible transitions (edges)

The Random Walk

At a step of the walk

Follows a transition to one of its states

(b): Edge of current node = no lookup cost

(c): Edge of new node = random access cost

Stopping the Random Walk

• At each step, for each pair of nodes

• Average the score over all pairs of nodes

• Stop when confScore > threshold

Results

Most in-links doesn’t necessarilymean the best starter

Results (continued)

Real World Application

• BlogScope– Project of University of Toronto– Provides graph and search output

of blog data– How does it work?• Crawler to gather blog data and filter spam• Stored in MySQL (1174.14 million posts)• Build statistics regularly• Provide correlation discovery, popularity curves, and hot

keywords

Related WorkDiscovering Leaders from Community Actions

Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan

Users perform actions (bookmark url, rate song, buying gadgets, etc)

Friends see actions and may perform same actions (influence)

Compute influence matrix with a sliding window working backwards

Pass over actions log only once

Uses frequent pattern discovery to determine leaders

Finds tribes where one user influences a group of people over a series of actions

Problem when there is a popular action where influence might not be a factor

Efficient Identification of Starters and Followers in Social Media

Documents

DINNER MENU Indian Starters Tex-Mex & Caribbean Starters

Starters (YLE Starters)

Starters + Enclosed Product - Sprecher · Starters + Enclosed Product General Description ... Multi-Speed Starters..... C83 Reduced Voltage Starters ... closed products ready to install

ARE SOCIAL MEDIA THE · FOLLOWERS ON CORPORATE TWITTER PROFILES Roche 4,586 followers eBay 4,638 followers Syngenta 1,724 followers Procter & Gamble 5,166 followers NCR Corporation

RICK WADE - deeplomatic.com · FANBASE 16000+ Facebook followers 5500+ Instagram followers 12000+ Soundcloud followers 89000+ Spotify listeners

Social Report for hm - Klear · Fans include Twitter followers, Facebook likes, and Instagram followers TOP NEW FOLLOWERS @lukescrystal 10.1K Followers @misshapemistake 2.2K Followers

STARTERS MENUS & SIDES STARTERS MENUS MENUS & …

Load Feeders, Motor Starters and Soft Starters

Drivers & Starters - Brammerbrammer.ie/Brochures/Drives and Starters.pdf · Contact Brammer on 0870 240 2100 Drives & Starters 283 1 ... Drives & Starters ATS48 Soft Starters

Social Media Thought Starters -update-v17 · 2019. 8. 28. · Social Media Thought Starters Share some OAT-spiration with your followers! Below are a few ideas to help you develop

Buy periscope followers – Amazing way to Increase Followers Count

HISTORY FOLLOWERS

Manual Motor Starters• Manual Motor Starters DDV-SeriesV

Barbara Blogger - Media Kit · Twitter Product Reviews5,000+ Followers Facebook 7,000+ Followers Instagram 12,000+ Followers Pinterest 10,000+ Followers Sponsored Blog Posts - original

Presentación de PowerPoint - hola.com · 6profiles 1 profile hola revista: 1.703.939 followers hola fashion: 207.218 followers hola cocina: 64.692 followers hola viajes 9.400 followers

STARTERS AND ALTERNATORS - Hella · Online database for used part identification Range information Technical information Starter and alternator catalog AT A GLANCE – STARTERS AND

Isolation, Biochemical Characterization and DNA ... · Isolation, Biochemical Characterization and DNA Identification of Yoghurt Starters Streptococcus Thermophilus & Lactobacillus

IKO Heavy Duty Type Cam Followers and Roller Followers · 4 Identification Number Identification numbers of Heavy Duty Type Cam Followers and Roller Followers consist of a model code,

ALTERNATORS & STARTERS BATTERIES - everkraft.eu · VEHICLE ENERGY PROVIDER ALTERNATORS & STARTERS BATTERIES Alternators & Starters Batteries

2015b2b.topfun.com/catalogues/Leader-catalogue.pdfSocial Media Presence Facebook Followers: 946,450+ Instagram Followers: 51,315+ Twitter Followers: 7,220+ @leaderbikeusa leaderbikes