Upload
alberto-ramirez
View
1.504
Download
0
Embed Size (px)
Citation preview
Social Network Analysis
& Visualization
#Beginner #Twitter #SNA #DataViz #VTCC7
#Gephi #MongoDB #JavaScript #JVM #Node.js
@mirezez Alberto Ramirez [email protected]
About Presenter
Currently
Systems Architect, iSystems
Previously
Happy ITLAPD
2015!
Nodes & Edges
Un/directed, Un/weighted
DegreeIn-DegreeOut-Degree
Topology
Graph DensityMeaures how many edges are in the graph compared to the maximum
possible number of edges (complete graph).
DiameterLongest path between any two nodes in the network
RadiusMinimum eccentricity for any given node in the graph
Graph Basics
Real NetworksProperties
1. Growth: Networks are assembled one node at a time
and increase in size.
2. Preferential attachment: As new nodes join the
network, the probability that it will choose a given
node is proportional to the number of nodes that
target node already has.
“Rich Get Richer”
Social Network Examples
Undirected, UnweightedDirected, Unweighted
Cliques• k-clique, where all nodes are adjacent to each other within the
subgraph.
• n-clique, where n is a positive integer, is a collection C of vertices in
which any two vertices u,v ∈ C have distance ≤n.
• p-clique, where p is a real number between 0 and 1, is a collection C of
vertices in which any vertex has ≥p|C| neighbors in C.
Trouble with Clique Targeting1. Not resilient networks.
2. Uniformity in the way cliques are defined can lead to little to no insights into that subgraph.
3. The clique might be a narrowing of a larger, more legitimate community to be evaluated.
Finding Cliques
K-CoresMaximal subgraph with minimum degree at least k.
In Graph G as k increases,
the subgraph becomes more exclusive
Finding K-Cores
Node CentralityIdentifying important nodes
Betweenness Centrality
Measures how often a node appears in the shortest path between nodes in network
Closeness Centrality
Average distance from a given node to all other nodes in the graph
Edge-Betweenness
(Hierarchical) Clustering
Girvan–Newman algorithm O(N^3)
1. Calculate betweenness of all edges in graph.
2. Remove edge with highest betweenness.
3. The betweenness of all edges affected by the removal is recalculated.
4. Rinse and repeat until no edges remain.
Expensive, Yet Intuitive Decomposition of Graph
Case Study: Finding @VTCodeCamp
Twitter Communities
Methodology
NetworkTwitter users as nodes, follows as directed edges.
1. Find all followers of @VTCodeCamp, recursively find
next level of users.
2. Removing @VTCodeCamp from final datasets.
3. Twitter RESTful Search API v1.1
4. Node.js Client
5. MongoDB 3.0 Aggregation Framework
6. GEXF - Graph Exchange Xml Format
7. Gephi & Gephi Toolkit (JVM) - Analysis & Viz
Twitter API
Node.js Client
MongoDB
.gexf Format
Gephi / Toolkit
Data
Flo
w
GET followers/idshttps://api.twitter.com/1.1/followers/ids.json?cursor=-1&screen_name=vtcodecamp&count=5000
15 Resource Requests per 15 Minute Window,
5000 max per response (cursor)
Solution
Throttling
Twitter Search API
GET users/lookup180 Resource Requests per 15 Minute Window
Throttling
db.usergraph.aggregate([
{ $project : {_id:0, user_id : "$user_id_str",
twitterFollowers : "$followers.ids_str" } },
{ $unwind : "$twitterFollowers" }]);
KV Pairs of User/Follower
Aggregation FrameworkRank Followers of @VTCodeCamp by their In-Degree
Aggregate k-cliques
GEXF Format
Graph Statsvtcodecamp
557 users
11,058 follows364,951 users
1,045,606 follows
Gephi Toolkit Demo
Gephi Application Demo
Iteration #1
MCL Clustering
Betweenness Node Size
Iteration #1
MCL Clustering
Betweenness Node Size
Diameter = 7
Software/Data Resources◦ Applications
Gephi
Tulip
Pajek (Windows)
◦ Packages
NetworkX (Python)
igraph (R or Python)
Statnet (R)
Sigma.js
Gephi Toolkit (JVM)
◦ Data Sets
Gephi Datasets https://github.com/gephi/gephi/wiki/Datasets
UCIrvine - http://networkdata.ics.uci.edu/index.php
Presentation Resources
GitHubhttps://github.com/mirez/snadataviz-vtcc7.git