24
Social network partition Presenter: Xiaofei Cao Partick Berg

Social network partition

  • Upload
    kesia

  • View
    20

  • Download
    1

Embed Size (px)

DESCRIPTION

Social network partition. Presenter: Xiaofei Cao Partick Berg. Problem Statement. - PowerPoint PPT Presentation

Citation preview

Page 1: Social network partition

Social network partition

Presenter:Xiaofei CaoPartick Berg

Page 2: Social network partition

Problem Statement

• Say we have a graph of nodes, representing anything you can imagine, but for our purposes let’s say it represents the population spread out across a country. Some of these nodes are closely batched together, representing a “community”, while others are further away representing a different “community”. We want to detect the communities in this graph. So how do we do this?

Page 3: Social network partition
Page 4: Social network partition

Some Definitions

• Before we try and develop a solution to this problem, we should get a few common definitions out of the way first.

• Degrees – A degree is the number of edges connected to a node.

• Community – A community is a grouped together (by some similarity) set of nodes that are densely connected internally.

Page 5: Social network partition

A

CB

D

GFE

H

Degree of C is 3

Degree of D is 2

Page 6: Social network partition

Why find Communities?

• We want to find communities to see the relations between groups and their connections to others.

• We can use this to find groups of people that share particular traits easily, such as terrorist organizations (or any other social network).

Page 7: Social network partition

How do we find Communities?

• Vertex Betweenness – This is a measure of a vertex (or node’s) centrality within the graph. This quantifies the number of times a node acts as a bridge in a shortest path between two other nodes.

Page 8: Social network partition

Use BFS to find shortest-paths

• We use the BFS (Breadth First Search) Algorithm to find the shortest paths between each node and every other node.

• From this we can calculate the vertex betweenness for each node.

Page 9: Social network partition

Girvan Newman Algorithm

• We can use the Girvan Newman algorithm to detect communities in the graph.

• Girvan Newman takes the “Betweenness” score and extends the definition to edges.

• So an edge “Betweenness” score is the number of shortest paths between a pair of nodes that runs along it.

• If there are more than one shortest paths, each path is assigned a value such that all paths have equal value.

Page 10: Social network partition

Girvan Newman Algorithm Continued

We can see that by using this method of edge “betweenness” scoring that communities will have lower edge scores between nodes in their community and higher edge scores along edges that connect them to other communities.

To find the community, we now remove the highest scoring edge and re-calculate the “betweenness” score for each of the affected edges.

Page 11: Social network partition

Example

A

C

DB

The highest edge score is 6,connecting node A to node C.So we remove this edge first.

Page 12: Social network partition

Girvan Newman Algorithm Continued

• Now we continue to remove each highest score edge from the graph and recalculate until no edges remain.

• The end result is a dendrogram that shows the clusters of communities in our graph.

Page 13: Social network partition

• Proposed by Girvan-Newman in paper: "Community structure in social and biological networks." Proceedings of the National Academy of Sciences 99.12 (2002): 7821-7826.

• Complete algorithm in paper: "Finding and evaluating community structure in networks." Physical review E 69.2 (2004): 026113.

Sequential Algorithm

Page 14: Social network partition

Girvan Newman algorithm

• Goal: find the edge with the highest betweenness score and remove it. Continue doing that until the graph been partitioned.

• Import: The graph for every iteration. (adjacency matrix)• Output: The betweenness score for every edges. (Betweenness

matrix)• The algorithm can be separate into 2 parts.

Page 15: Social network partition

Part I: Find the number of shortest path from one node to every other nodes

• From top to down.• Using breadth first algorithm to generate a new view for that

node.• Find the number of shortest path.

Page 16: Social network partition

7

3

21

5 6

4

1

3 2 5

6 4 7

View from node 1

8

8

7

8 5 61 1 1

2 1 1

3

41 2

3

1 1 1

1 1 1

2

Page 17: Social network partition

Part II Calculate the edges betweenness score for every iteration

• From bottom to up.• Every nodes contain one score.• Every edges’ score equal to Node_score/#shortest_path*(# of

shortest path to the upper layer nodes)• Sum up edges’ scores for every iteration.

Page 18: Social network partition

7

3

21

5 6

4

1

3 2 5

6 4 7

View from node 1

8

8

7

8 5 61 1 1

2 1 1

3

41 2

3

1 1 1

1 1 1

2

2/3 1/3

4/315/6 5/6

25/6111/6

1/2 1/2

3/2 3/21/21/2

3 31

Score=Node_score/#shortest_path*(# of shortest path to the upper layer nodes)

Page 19: Social network partition

Analysis the time complex

• Number of iteration in the big loop: n (number of nodes)• Time complex of finding the shortest path: O(n^2)• Time complex of calculating the betweenness score: O(n)• Adding the betweenness matrix: n^2

• Time complex is: n*(n^2+n+n^2)=O(n^3);

Page 20: Social network partition

Parallel algorithm (Intuitively)

• Assigned every processor the same adjacency matrix of the original network.

• They start from different nodes. Generating views and calculating the betweenness matrix for each starting nodes. Then sum the matrix locally first.

• Doing prefix sum and update the original network by remove the highest score edges.

Page 21: Social network partition

P1 P2 P4P3 P5 P6 P8P7

G1,G2,G3 G4,G5,G6 G7,G8,G9 G10,G11,G12 G13,G14,G15 G16,G17,G18 G19,G20,G21 G22,G23,G24

Breath first algorithm

V1,V2,V3

V4,V5,V6

V7,V8,V9

V10,V11,V12

V13,V14,V15

V16,V17,V18

V19,V20,V21

V22,V23,V24

Sum the between-ness score locally

B1 B2 B4B3 B5 B6 B8B7

Parallel Prefix Sum

B1 B2 B4B3 B5 B6 B8B7

B1 B2 B4B3 B5 B6 B8B7

B1 B2 B4B3 B5 B6 B8B7

B1 B2 B4B3 B5 B6 B8B7

Use B8 Value to update network

Gn: start from node n in graph

Page 22: Social network partition

Analysis of time complex

• Number of iteration: n/p;• Find the number of shortest path: O(n^2);• Find the betweenness score: O(n);• Adding betweenness score locally: O(n^2);

• Adding betweenness score globally(prefix sum): O(n^2*log(p))• Time complex: n/p*(n^2+n+n^2)+n^2*log(p) =n^2(n/p+log(p));

Page 23: Social network partition

Continue

• Speed up: n/(n/p+log(P)) • When n=p*log(p); speed up = p; It is cost optimal.

Page 24: Social network partition

Question