Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Preview:

Citation preview

Team No 24Integrating Network

Discovery and Community Detection

Nikhil Daliya - 201301142Athresh G - 201505565

Overview Integrating network discovery and community detection routines for nodes in thegiven network and identifying the characteristics of the nodes (constant or rapidlychanging) in the network.

Dataset Railway datasetRailway network, proposed by [Ghosh et al. 2011] consists of nodes representing railway stations in India, where two stations si and sj are connected by an edge if there exists at least one train-route such that both si and sj are scheduled halts on that route. Here the communities are states/provinces of India since the number of trains within each state is much higher than the trains in-between two states.

Dataset FootballFootball network, proposed by [Girvan and Newman 2002a] contains the network of American football games between Division IA colleges during the regular season of Fall 2000. The vertices in the graph represent teams (identified by their college names) and edges represent regular-season games between the two teams they connect.

DatasetFootballThe teams are divided into conferences (indicating communities) containing around 8-12 teams each. Games are more frequent between members of the same conference than between members of different conferences. Teams that are geographically close to one another but belong to different conferences are more likely to play one another than teams separated by large geographic distances.

Application●Exploring the adversarial networks(such as terrorist networks).

●Clustering in social networks.

●Politeness policies on crawling website makes it difficult to mine the whole network on social networking sites. There are space and bandwidth limits which put constraints on the size of network that can be mined.

Challenges● Dynamic discovery of the network imposes problems in clustering of nodes . ● Identifying the characteristic of nodes(constant , changing or rapidly changing) is difficult problem.●The dataset grows rapidly with network discovery and keepingtrack of probability distribution of each node for different communities ischallenging task.

Tools Used● Third party package ( https://sites.google.com/site/santofortunato/inthepress2 ) for generating synthetic graphs as input.● Language to be used: Python and Java. Packages such as panda, numpy, scikit learn, networkX and igraph will be used accordingly.● matplotlib package for plotting the results for better visualization and understanding.

Implementation●We have used 2 modules mainly ChooseNode which chooses node in each iteration to be merged to the network and UpdateCommunity which will update the community or clusters from the choosen node.

●Spectral clustering is applied on the initial set of target nodes.

ImplementationDuring ChooseNode we use 2 measures to choose the node for updation.Ncut measure : minimize the similarity across a cut, while simultaneously maximizing the similarity within the same community.

Modularity : additional fraction of the edges that fall within the given communities over the expected fraction

ImplementationI/P : ●Initial set of clustering , Initial network, cost and budget.

O/P : ●Final network and nodes with clusters formed from nodes we have discovered.

●List of rapidly changing nodes in the network.

Results and Analysis- We have used Average Clustering Purity (ACP) and Average Clustering Entropy (ACE) to measure effectiveness of our algorithm.

- Both these measures incorporates the fraction of nodes of particular cluster belonging to same class as their measure.

Results and AnalysisRailway Dataset :

Total no. of target nodes : 80

Average cluster purity : 0.79

Average Cluster entropy : 0.17

Rapidly changing nodes : 6,47,84,91

Results and AnalysisRailway Dataset :

Results and AnalysisRailway Dataset :

Results and Analysis Football Dataset :

Total no. of target nodes : 48

Average cluster purity : 0.91

Average Cluster entropy : 0.11

Changing nodes : 51 , 63 , 49

Results and Analysis Football Dataset :

References

Research paper : On integrating Network and

Community Discovery

http://hanj.cs.illinois.edu/pdf/wsdm15_jliu.pdf

Thank You !!!!