Fuzzy analysis of community detection in complex networks

Physica A 389 (2010) 5319–5327

Contents lists available at ScienceDirect

Physica A

journal homepage: www.elsevier.com/locate/physa

Fuzzy analysis of community detection in complex networksDawei Zhang a, Fuding Xie a,b,∗, Yong Zhang a, Fangyan Dong b, Kaoru Hirota ba Department of Computer Science, Liaoning Normal University, Liaoning Dalian 116081, PR Chinab Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Japan

a r t i c l e i n f o

Article history:Received 1 December 2009Received in revised form 8 July 2010Available online 21 July 2010

Keywords:Complex networkCommunityClusterQuantitative condition

a b s t r a c t

A snowball algorithm is proposed to find community structures in complex networksby introducing the definition of community core and some quantitative conditions. Acommunity core is first constructed, and then its neighbors, satisfying the quantitativeconditions, will be tied to this core until no node can be added. Subsequently, one by one,all communities in the network are obtained by repeating this process. The use of the localinformation in the proposed algorithm directly leads to the reduction of complexity. Thealgorithm runs inO(n+m) time for a general network andO(n) for a sparse network,wheren is the number of vertices and m is the number of edges in a network. The algorithm fastproduces the desired results when applied to search for communities in a benchmark andfive classical real-world networks, which are widely used to test algorithms of communitydetection in the complex network. Furthermore, unlike existing methods, neither globalmodularity nor local modularity is utilized in the proposal. By converting the consideredproblem into a graph, the proposed algorithm can also be applied to solve other clusterproblems in data mining.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

Detecting communities in complex networks is of considerable importance for understanding both the structures andthe functions of a network. Conventionally, a community is taken to be a group of vertices in which there are more edgesbetween vertices within the group than to vertices outside of it. Although the notion of community is straightforward andthe partitioning of a network into such groups is a well-studied problem, the design of methods to partition a network intoseveral meaningful highly interconnected components is highly nontrivial.It is noted that the definition of community mentioned above is qualitative and not quantitative. That is to say that the

definition is only a principle. It does not clearly state which vertices are in the same cluster due to high connectivity. To dealwith this problem,Wasserman and Faust [1] introduced the LS set. The LS set is defined as a set of nodes in which each of itssubsets has more ties to its components within the set than outside. The LS set definition is also quite stringent. Moreover,it is a very tough problem to detect all the LS sets in a network. In order to relax the constraints, Radicchi et al. [2] proposedthe strong and weak definitions. In a strong community, each node has more connections within the community than withthe rest of the network, and in a weak community the sum of all degrees within the community is larger than the sum ofall degrees toward the rest of the network. Based on these definitions, the self-contained algorithm is developed, which issimilar to the GN algorithm for finding strong or weak communities in a network. By comparing the above definitions, Huet al. [3] proposed a comparative definition for community in networks. A community is defined as a set of nodes, which

∗ Corresponding author at: Department of Computer Science, Liaoning Normal University, Liaoning Dalian 116081, PR China. Tel.: +86 411 8599 2418;fax: +86 411 8215 8451.E-mail address: [email protected] (F. Xie).

0378-4371/$ – see front matter© 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.physa.2010.07.016

http://www.elsevier.com/locate/physa

http://www.elsevier.com/locate/physa

mailto:[email protected]

http://dx.doi.org/10.1016/j.physa.2010.07.016

5320 D. Zhang et al. / Physica A 389 (2010) 5319–5327

satisfies the requirement that each node degree inside the community not to be smaller than the nodes degree toward anyother community. This definition is in the middle of the strong and weak definitions. The time complexity of the algorithmthey proposed is O(n2). Hu’s definition quantitatively tell us how to tie a node to a known set.For a community detected by any algorithm, one can easily decide to which types (strong, weak community or Hu’s

definition) it belongs in a network. But usually, it is difficult how to fast discover the community structures in networks byusing these definitions. In this study, by analyzing the definition of the strong community and Hu’s definition, an algorithmwith time complexity O(n+ m) is developed with the help of the proposed rules, where n is the number of vertices and mis the number of edges in a network. We first find a node with a maximal degree and its neighbors in the network, and acommunity core is constructed by the definition similar to the strong community. Then, the vertices satisfying conditionswill walk into the community core one by one until no vertex goes into it. Finally, repeating this process in the rest of thevertices, the other communities will be obtained. Tests on five typical real-world networks and a benchmark reveal thatthe proposed algorithm produces desired results. The obtained results explain the rationality of the proposal. After properlyrevising conditions, the proposed method is easily extended to weighted networks.

2. Related work

In recent years, the complex networks are widely investigated in physics, mathematics and computer science. Oneproperty that has attracted particular attention is that of network’s community structure. A key problem in this field is howto fast and accurately detect communities in a network. The prominent algorithms include Kernighan–Lin algorithm [4],spectral bisection algorithm [5], and GN algorithm [6]. A great deal of algorithms which aim at discovering a reasonabledivisions of the networks have also been reported in the literature. Capocci et al. [7] developed an algorithm to detectcommunity structure in complex networks. This algorithm is based on the spectral method and takes into account weightsand links’ orientations. Using the concept of the network communicability, Estrada and Hatano [8] defined communitiesin a complex network and the problem of finding the network communities was transformed to an all-clique problemof the communicability graph. Based on a distance measure, Zhou [9] calculated the dissimilarity index between nearestneighboring vertices of a network and designed an algorithm to partition these vertices into communities that arehierarchically organized. A network flow algorithm based on fundamental principles of graph theory was introduced toidentify the sparsest cuts and an underlying hierarchical community structure of the network via maximum concurrentflow [10]. With the similar consideration of standard clustering coefficient in binary networks, a definition of the clusteringcoefficient for bipartite networks based on the fraction of squares is proposed by Zhang et al. [11]. Based on the measureof similarity among community structures, accuracy and precision of three algorithms were investigated [12]. Focusing ona measure originally defined for unweighted networks, the global clustering coefficient, Opsahl and Panzarasa generalizedthis coefficient in the weighted network [13]. A general spectral method was proposed to find communities of a networkbased on network complement and anti-community concepts [14].Tomeasure the division quality of a network, Newman [15] introduced themodularityQ , ameasure for a specific division

of a network into communities. Bigger modularity corresponds to a better detection of community structures. In 2008 Shenet al. [16] defined a community recursive coefficient (CRC) denoted by M instead of Q (modularity) to quantify the effect ofthe splitting results. They proved that a recursive optimization of the local M is equivalent to acquiring the maximal globalQ value corresponding to good divisions. A very fast algorithm for detecting community structures in complex networks isproposed in Ref. [17]. The algorithm is based on a table that describes a network and a virtual cache similar to a cache as thecomputer structure. Wang et al. [18] reported a fast and efficient heuristic algorithm for detecting community structures incomplex networks in 2009.Another good idea to evaluate and find the communities in the network is the introduction of local modularity proposed

by Chen et al. [19]. The utilization of local modularity will generally give rise to the increase of the computation speedbecause local information in the network is only related.Given the relevance of the problem, it is crucial to construct efficient procedures and algorithms for the identification of

the community structure in a generic network. This task, however, is highly nontrivial. Therefore, it is worthwhile to furtherinvestigate this problem.

3. Depiction of the proposed snowball algorithm

Suppose there is a network G = (V , E) which has n nodes and it can be represented mathematically by an adjacencymatrix Awith elements Aij = 1 if there is an edge from vi to vj and Aij = 0 otherwise.The detection of communities in a network is to partition nodes into different groups. By investigating a number of real

networks and many algorithms, we find that some nodes are always in the same community no matter which algorithm isapplied. These nodes exhibit high agglomeration, and therefore are the core of a community.

Definition. The subset Cr is a core of a community C ⊆ V if we have∑vj∈Cr

Aij >∑

vj∈C−Cr

Aij.

D. Zhang et al. / Physica A 389 (2010) 5319–5327 5321

Obviously, this definition is equivalent to strong definition introduced by Radicchi et al. [2] if C = V . Otherwise itindicates the strong structure in a community. When C is a strong community, C is also its core. This definition can bethought as a generalization of the strong definition.

For a given community in a network, it is easy to determine which nodes make up its core. Contrarily, is it possible toconstruct a community by its core? The high agglomeration of community core and Hu’s definition imply that it is possibleto find a community in a network once we obtain its core. Now, the key point is how to search for a community core. Byanalyzing the relationships among the nodes, links and communities in a network, one can find that the nodes with highdegree have more powerful agglomeration than the one with low degree. Unlike Hu’s method of initially setting each nodeand its random half neighbors to be a community [3], we always select the nodewith amaximal degree and its all neighborsto construct the community core in the network. Following is the description of the snowball algorithm in detail.Initially, all nodes in a network are unlabeled.

I: Find the community core.The node vi with a maximal degree and its unlabeled neighbors in V are first found. Ni denotes the set of unlabeled

neighbors of vi. For each node vj in Ni, the value βj =|Ni

⋂Nj|+1dj

is computed, where dj is the degree of vj. The nodes vj

with βj > 0.5 constitute the community core Cri. If|Cri|di> 0.5 and |Cri| ≥ 3, one attaches community label to nodes in Cri

accordingly. Otherwise, the node vi is labeled symbol ‘W’ (wait for being determined later). In this case, this step needs tobe repeated in V = V − {vi}.II: Add nodes.Let N be the set of neighbors of Cri. Find an unlabeled node vk in N . N ′k denotes the set of neighbors of vk. The node vk is

moved into Cri and labeled if one of the following conditions is satisfied.

(1) |Cri⋂N ′k| is equal or greater than dk/2.

(2) |Cri⋂N ′k| is equal or greater than any of |Cg

⋂N ′k| and |Su

⋂N ′k|, where Cg is a detected community with label g , Su is a

set of unlabeled nodes in V .

Let Cri = Cri⋃{vk}, and update N by moving vk’s neighbors which are not in Cri into N . Repeat this process in N until

there is no node satisfying conditions (1) and (2).Let V = V − Cri, and go to step I while V is not null.

III: Dispose the nodes labeled ‘W’.After all nodes have been labeled, if there is no node labeled ‘W’, then all the communities in the network have been

found. Otherwise, the following steps will put these nodes into proper communities, respectively.

(1) Compute αs = |N ′s⋂Sw|/ds for each node vs in Sw , where Sw is a set of nodes labeled ‘W’ in the network. Search for the

node vs with minimal αs in Sw .(2) Move vs into community g if |Cg

⋂N ′s| > |Ci

⋂N ′s|, for all detected community i 6= g . If there exist several such

communities g , compute Dg =∑dt , where vt runs over Cg

⋂N ′s . Then, node vi is moved into community g whose

Dg is maximal. Otherwise node vs is randomly put into one of them.(3) Let Sw = Sw − {vs} and repeat this process until set Sw is null.

The proposed algorithm always extracts the best community from a found node and its neighbors’ information. Theprocedure of detecting a community is similar to rolling a snowball. So this algorithm is called the snowball algorithm.The running time of the proposed algorithm mainly depends on the computational demand of the steps I and II. It is

trivial to find a node with a maximal degree in O(n). The working time of constructing a community core Cri is O(di + 2mi),where di is the degree of vi and mi is the number of edges in Cri. The process in step II is similar to the one reported inRef. [18]. Thus, the time consumption of this step is approximately O(k2i ni), where ni is the total number of community i andki is the mean vertex degree in the extracted community [18]. The running time of the proposal algorithm is approximatelyO(n+m), where n is the number of vertices andm the number of edges in the network.

4. Experimental results

To evaluate the performance of the proposed algorithm, the snowball algorithm is implemented in Java language andEclipse RCP IDE running on PC with with 2.66 GHz duo processor and 2 GB memory. A benchmark and five classicalreal-world networks, in which the community structures of these examples are known, have been tested. As a result, thereasonable communities in these networks have been found. Furthermore, the proposed algorithm is compared with theextremal algorithm, GN fast algorithm, spectral optimization algorithm and tabu search algorithm in the value of accuracy,computation time and the number of community. The experimental results show that the proposal is acceptable.

4.1. Zachary’s karate club network

The famous karate club network analyzed by Zachary [20] is widely used as a test example for algorithms that detectcommunities in complex networks. It consists of 34 nodes and 78 edges as shown in Fig. 1. Due to contrasts between an


Fig. 1. The community structure in Zachary’s karate club network.

Fig. 2. Ten communities in the College football network.

instructor and the administrator in club, the club splits into two smaller ones. Applying the proposed algorithm to thisnetwork, the node 34 and its 17 neighbors are first found, and then the community core Cr1 is obtained which consists ofnode 34 and its neighbors except nodes 10, 14, 20, 28, and 32. The nodes 10, 28, 32, 3, 26, and 25 go into this core one afterthe other in step II. The first community is detected. The rest of nodes in the network form another community by executingstep I and step II.All nodes except node 3 in this network are classified correctly by using the proposal. This node shares the same number

of links with each community, and it seems that two communities overlap in this node. Thus, it should be reasonable thatnode 3 is put into any of communities.

4.2. College football network

The second tests the college football network of the 2000 season schedule of Division I games of USA. In this network, thenodes represent 115 teams and the links 613 games played in the course of the year. The teams are divided into 12 groups of8–12 teams [6,21]. The difference between this network and the above example is that there are not apparent center nodes.The degrees of nodes vary from 7 to 12. Ten communities are detected by the proposal, as shown in Fig. 2. Step III in theproposed algorithm is executed 33 times. That is to say that 33 nodes are labeled with ‘W’. The modularity of this divisionis 0.5919. This result reveals that the proposed algorithm is not good enough to deal with the network with non-apparentcommunity structures.


Fig. 3. Community structure in the social network of bottlenose dolphins extracted by the proposed algorithm.

4.3. Dolphin social network

In this example, a network composed of 62 bottlenose dolphins is investigated. This network was assembled and studiedby Lusseau et al. [22]. The nodes in this network represent 62 bottlenose dolphins living in Doubtful Sound, New Zealand,with social ties between dolphin pairs established by direct observation of statistically significant frequent associationover a period of several years. The network splits naturally into two subgroups. The partitions of this network into two orfour communities are reported in the literature [15,19,23]. This network is divided into four communities by the proposedmethod. It is clear that this is not the natural community division of the network. Themodularity of this division is 0.5126. Atthe right bottom in Fig. 3, the group labeled in green color is one of two real groups which consists of 21 bottlenose dolphinsand has fewest relations with other communities. Another real group splits into three subgroups in this study. Maybe, thisimplies the evolution trend of this real group with time.

4.4. Les Misérables network

Fig. 4 depicts Les Misérables network complied by Knuth [15,24]. This network reflects the interactions between majorcharacters in the novel Les Misérables written by Victor Hugo. In this network, the nodes represent characters and an edgebetween two vertices represents simultaneous appearance of both characters in one or more scenes. Four communities inthis network are detected by the proposed method. Communities basically reflect the subplot structure of the book. Onecan see that Jean Valjean (node 11) and the police officer Javert (node 27) are clearly center of the largest community andMyriel (node 0) leads a community with eight nodes.

4.5. Books on the American politics network

V. Krebs’ network of books on American politics is introduced by Newman [25]. In this network, the nodes represent 105recent books on American politics bought from the on-line bookseller Amazon.com, and edges join pairs of books that arefrequently purchased by the same buyer. Fig. 5 shows the result of feeding this network through the proposed algorithm.The right community in Fig. 5 nearly includes all liberal books except four centrist books and a conservative book. It is easilyseen that node 77 is connected more closely with right community than the left one although it should be put in the leftcommunity. Unlike the liberal books and conservative books, the community composed of centrist books is obviously splitincorrectly because centrist books have no apparent center.

4.6. The comparison of the results

In this section, we compare the results obtained by the proposed algorithm (PA) with those acquired by four famousalgorithms. From the Table 1, one can easily see that the proposed algorithm runs fastest in the extremal optimization


Fig. 4. The network of interactions between major characters in the novel Les Misérables by Victor Hugo.

Fig. 5. Krebs’ network of books on American politics.

algorithm (EO) [26], GN fast algorithm (GNFA) [15], spectral Optimization algorithm (SO) [25] and Tabu Search algorithm(TS) [27] for these examples. Table 2 lists the numbers of communities detected by five algorithms, respectively. For theZachary network and Krebs network, the number of communities extracted by our algorithm is entirely in agreement withthe actual cases, and the results obtained by the proposal are acceptable for the rest three networks.


Table 1Time cost of several algorithms (ms).

EO GNFA SO TS PA

Dolphins 1732 51 63 386 17Football 6833 50 106 960 7Zachary 1079 47 118 212 1Les Misérables 2675 48 85 392 1Krebs 3257 48 78 1357 1

Table 2The numbers of communities detected by several algorithms.

EO GNFA SO TS PA Original

Dolphins 4 4 5 4 4 2Football 10 7 8 10 10 12Zachary 4 3 4 4 2 2Les Misérables 7 5 8 7 4 5Krebs 5 4 4 8 3 3

Fig. 6. The left figure shows the run times of the proposed algorithm with other four algorithms on the benchmark with 500 nodes. To clearly distinguishour algorithm from the spectral optimization algorithm, the right figure indicates their relation.

In Ref. [28], Lancichinetti et al. introduced a artificial network, in which both the degree and the community sizedistributions are power laws, with exponents α (α = 2, 3) and β (β = 1, 2), respectively. Here we choose α = 3, β = 2,average degree = 10 and max degree = 250. Fig. 6 shows the run times of our algorithm and the other four algorithms onthis benchmark with 500 nodes. The proposal is faster than the spectral optimization method except Pout = 0.375, 0.4375.Comparing the TS and GNFA, our algorithm runs very fast. We can see that the communities obtained by our algorithm isnear to those original numbers of community when Pout ≤ 0.25 in Fig. 7. From Fig. 8, one finds that the partition accuracyof the proposed algorithm on this network is greater than 80% when Pout ≤ 0.25. The partition accuracy decrease sharplywhen Pout ≥ 0.3125. The reason is that the community structures are not apparentwhen Pout becomes greater. These resultsshow that the proposed algorithm can fast and perfectly copy with the problem of detecting the communities in networkswith an apparent cluster structure.

5. Discussion and conclusion

Todetect community structures in a network, globalmodularity and localmodularity arewidely used in a great number ofalgorithms reported in the literature. Generally speaking, the use of global modularity will improve the accuracy of divisionsin a network because of the global information. To speedily extract the community structure in networks, a local modularitymeasure is utilized since this measurement only depends on local information of the known nodes and its neighbors. Thus,it reduces the work time of algorithms. However, the random strategy is usually applied when choosing the initial node orneighbors [18]. Doing so, the results obtained may be different from each other.In this paper, a fast way to detect the communities in a network is introduced. The difference between our proposal and

the existingworks in the literature is that we do not use both globalmodularity and local modularity to help us to determinecommunity structures.To partition a network into the ideal communities, the problem of initial vertices choice is solved by


Fig. 7. The relationship between the numbers of communities detected by different algorithms and original one.

Fig. 8. The curves correspond to different partition accuracies obtained by different algorithms varying with kout .

always searching node with a maximal degree. The proposed method always extracts the best community around a knownvertex and its neighbors. The algorithm is considerably faster than most previously reported algorithms, and allows us toanalyze the community structure of the networks that were considered too large to be tractable in the past. The proposalis applied to detect community structures in five classical networks and a benchmark. The results indicate competitiveperformance.

Acknowledgements

Wewould like to thank the anonymous referee for his/her valuable suggestions. The first two authorswould like to thankDr. Hrvoje Markovic for his kind help. This work is supported by the National Natural Science Foundation of China (GrantNo. 10771092).

References

[1] S. Wasserman, K. Faust, Social Network Analysis, Cambridge University Press, Cambridge, UK, 1994.[2] F. Radicchi, C. Castellano, F. Cecconi, et al., Defining and identifying communities in networks, Proc. Natl. Acad. Sci. USA 101 (2004) 2658–2663.[3] Y.Q. Hu, H.B. Chen, P. Zhang, et al., Comparative definition of community and corresponding identifying algorithm, Phys. Rev. E 78 (2008) 026121.[4] B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs, Bell Syst. Tech. J. 49 (1970) 291–308.[5] J. Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM 46 (1999) 604–632.


[6] M. Girvan, M.E.J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99 (2002) 7821–7826.[7] A. Capocci, V.D.P. Sercedio, G. Caldarelli, F. Colaiori, Detecting communities in large networks, Physica A 352 (2005) 669–676.[8] E. Estrada, N. Hatano, Communicability graph and community structures in complex networks, Appl. Math. Comput. 214 (2009) 500–511.[9] H. Zhou, Distance, dissimilarity index, and network community structure, Phys. Rev. E 67 (2003) 061901.[10] C.F. Manna, D.W. Matulaa, E.V. Olinickb, The use of sparsest cuts to reveal the hierarchical community structure of social networks, Social Netw. 30

(2008) 223–234.[11] P. Zhang, J. Wang, X. Li, et al., Clustering coefficient and community structure of bipartite networks, Physica A 387 (2008) 6869–6875.[12] Y. Fan, M. Li, P. Zhang, et al., Accuracy and precision of methods for community identification in weighted networks, Physica A 377 (2007) 363–372.[13] T. Opsahl, P. Panzarasa, Clustering in weighted networks, Social Netw. 31 (2009) 155–163.[14] M. Zarei, K.A. Samani, Eigenvectors of network complement reveal community structure more accurately, Physica A 388 (2009) 1721–1730.[15] M.E.J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E 69 (2004) 066133.[16] Yi Shen, W. Pei, K. Wang, et al., Recursive filtration method for detecting community structure in networks, Physica A 387 (2008) 6663–6670.[17] A. Clauset, M.E.J. Newman, C. Moore, Finding community structure in very large networks, Phys. Rev. E 70 (2004) 066111.[18] X.T. Wang, G.R. Chen, H.T. Lu, A very fast algorithm for detecting community structures in complex networks, Physica A 384 (2007) 667–674.[19] D.B. Chen, Y. Fu, M.S. Shang, A fast and efficient heuristic algorithm for detecting community structures in complex networks, Physica A 388 (2009)

2741–2749.[20] W.W. Zachary, An information flow model for conflict and fission in small groups, J. Anthropol. Res. 33 (1977) 452–473.[21] S.H. Zhang, R.S. Wang, X.X. Zhang, Identification of overlapping community structure in complex networks using fuzzy c-means clustering, Physica A

374 (2007) 483–490.[22] D. Lusseau, K. Schneider, O.J. Boisseau, et al., The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting

associations. Can geographic isolation explain this unique trait? Behav. Ecol. Sociobiol. 54 (2003) 396–405.[23] M.E.J. Newman, Finding community structure in networks using the eigenvectors of matrices, Phy. Rev. E 74 (2006) 036104.[24] D.E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993.[25] M.E.J. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci. USA 103 (2006) 8577–8582.[26] J. Duch, A. Arenas, Community detection in complex networks using extremal optimization, Phys. Rev. E 72 (2005) 027104.[27] A. Arenas, A. Fernandez, S. Gomez, Analysis of the structure of complex networks at different resolution levels, New J. Phys. 10 (2008) 053039.[28] Andrea Lancichinetti, Santo Fortunato, Filippo Radicchi, Benchmark graphs for testing community detection algorithms, Phys. Rev. E 78 (2008) 046110.

Documents

Fuzzy analysis of community detection in complex networks