22
Social Network Analysis & Visualization #Beginner #Twitter #SNA #DataViz #VTCC7 #Gephi #MongoDB #JavaScript #JVM #Node.js @mirezez Alberto Ramirez [email protected]

Social Network Analysis and Visualization

Embed Size (px)

Citation preview

Page 1: Social Network Analysis and Visualization

Social Network Analysis

& Visualization

#Beginner #Twitter #SNA #DataViz #VTCC7

#Gephi #MongoDB #JavaScript #JVM #Node.js

@mirezez Alberto Ramirez [email protected]

Page 2: Social Network Analysis and Visualization

About Presenter

Currently

Systems Architect, iSystems

Previously

Happy ITLAPD

2015!

Page 3: Social Network Analysis and Visualization

Nodes & Edges

Un/directed, Un/weighted

DegreeIn-DegreeOut-Degree

Topology

Graph DensityMeaures how many edges are in the graph compared to the maximum

possible number of edges (complete graph).

DiameterLongest path between any two nodes in the network

RadiusMinimum eccentricity for any given node in the graph

Graph Basics

Page 4: Social Network Analysis and Visualization

Real NetworksProperties

1. Growth: Networks are assembled one node at a time

and increase in size.

2. Preferential attachment: As new nodes join the

network, the probability that it will choose a given

node is proportional to the number of nodes that

target node already has.

“Rich Get Richer”

Page 5: Social Network Analysis and Visualization

Social Network Examples

Undirected, UnweightedDirected, Unweighted

Page 6: Social Network Analysis and Visualization

Cliques• k-clique, where all nodes are adjacent to each other within the

subgraph.

• n-clique, where n is a positive integer, is a collection C of vertices in

which any two vertices u,v ∈ C have distance ≤n.

• p-clique, where p is a real number between 0 and 1, is a collection C of

vertices in which any vertex has ≥p|C| neighbors in C.

Trouble with Clique Targeting1. Not resilient networks.

2. Uniformity in the way cliques are defined can lead to little to no insights into that subgraph.

3. The clique might be a narrowing of a larger, more legitimate community to be evaluated.

Finding Cliques

Page 7: Social Network Analysis and Visualization

K-CoresMaximal subgraph with minimum degree at least k.

In Graph G as k increases,

the subgraph becomes more exclusive

Finding K-Cores

Page 8: Social Network Analysis and Visualization

Node CentralityIdentifying important nodes

Betweenness Centrality

Measures how often a node appears in the shortest path between nodes in network

Closeness Centrality

Average distance from a given node to all other nodes in the graph

Page 9: Social Network Analysis and Visualization

Edge-Betweenness

(Hierarchical) Clustering

Girvan–Newman algorithm O(N^3)

1. Calculate betweenness of all edges in graph.

2. Remove edge with highest betweenness.

3. The betweenness of all edges affected by the removal is recalculated.

4. Rinse and repeat until no edges remain.

Expensive, Yet Intuitive Decomposition of Graph

Page 10: Social Network Analysis and Visualization

Case Study: Finding @VTCodeCamp

Twitter Communities

Methodology

NetworkTwitter users as nodes, follows as directed edges.

1. Find all followers of @VTCodeCamp, recursively find

next level of users.

2. Removing @VTCodeCamp from final datasets.

3. Twitter RESTful Search API v1.1

4. Node.js Client

5. MongoDB 3.0 Aggregation Framework

6. GEXF - Graph Exchange Xml Format

7. Gephi & Gephi Toolkit (JVM) - Analysis & Viz

Page 11: Social Network Analysis and Visualization

Twitter API

Node.js Client

MongoDB

.gexf Format

Gephi / Toolkit

Data

Flo

w

Page 12: Social Network Analysis and Visualization

GET followers/idshttps://api.twitter.com/1.1/followers/ids.json?cursor=-1&screen_name=vtcodecamp&count=5000

15 Resource Requests per 15 Minute Window,

5000 max per response (cursor)

Solution

Throttling

Twitter Search API

GET users/lookup180 Resource Requests per 15 Minute Window

Throttling

Page 13: Social Network Analysis and Visualization
Page 14: Social Network Analysis and Visualization

db.usergraph.aggregate([

{ $project : {_id:0, user_id : "$user_id_str",

twitterFollowers : "$followers.ids_str" } },

{ $unwind : "$twitterFollowers" }]);

KV Pairs of User/Follower

Aggregation FrameworkRank Followers of @VTCodeCamp by their In-Degree

Aggregate k-cliques

Page 15: Social Network Analysis and Visualization

GEXF Format

Page 16: Social Network Analysis and Visualization

Graph Statsvtcodecamp

557 users

11,058 follows364,951 users

1,045,606 follows

Page 17: Social Network Analysis and Visualization

Gephi Toolkit Demo

Page 18: Social Network Analysis and Visualization

Gephi Application Demo

Page 19: Social Network Analysis and Visualization

Iteration #1

MCL Clustering

Betweenness Node Size

Page 20: Social Network Analysis and Visualization

Iteration #1

MCL Clustering

Betweenness Node Size

Diameter = 7

Page 21: Social Network Analysis and Visualization

Software/Data Resources◦ Applications

Gephi

Tulip

Pajek (Windows)

◦ Packages

NetworkX (Python)

igraph (R or Python)

Statnet (R)

Sigma.js

Gephi Toolkit (JVM)

◦ Data Sets

Gephi Datasets https://github.com/gephi/gephi/wiki/Datasets

UCIrvine - http://networkdata.ics.uci.edu/index.php

Page 22: Social Network Analysis and Visualization

Presentation Resources

GitHubhttps://github.com/mirez/snadataviz-vtcc7.git