14
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Abstract—Due to privacy and security requirements or technical constraints, it is a difficult work for the traditional centralized approaches to perform data clustering in a large, dynamic distributed peer-to-peer network. In this paper, a novel collaborative fuzzy clustering algorithm is proposed, in which the centralized clustering solution is approximated by performing distributed clustering at each peer with the collaboration of other peers. The required communication links are established at the level of cluster prototype and attribute weight. The information exchange only exists between topological neighboring peers. The attribute-weight-entropy regularization technique is applied in the distributed clustering method to achieve ideal distribution of attribute weights, which ensures the good clustering results. And the important features are successfully extracted for the high-dimensions data clustering. The kernelization of the proposed algorithm is also realized as a practical tool for clustering the data with ‘non-spherical’ shaped clusters. Experiments on synthetic and real-world datasets have demonstrated the efficiency and superiority of the proposed algorithms. Index Terms—Distributed peer-to-peer network, Collaborative clustering, Subspace clustering, Kernel-based clustering I. INTRODUCTION ROTOTYPE-BASED partitioning clustering is an essential machine learning technology using for data mining, pattern recognition and statistical analysis [1-3]. It partitions the data into clusters according to the similarities between objects and helps in extraction of new information or discovering new patterns. In the past few decades, the mainstream data clustering techniques are basically based on centralized operation, i.e., data sets are of small manageable sizes, and usually reside on one central site, then a single process performs clustering on the data. K-means algorithm [4] This work was supported in part by the National 973 Basic Research Program of China under Grant No.2011CB302801 and Macau Science and Technology Development Fund under Grant No.008/2010/A1. J. Zhou is with the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China, and he is also with the School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China (e-mail: [email protected]). C. L. Philip Chen and L. Chen are with the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China (e-mail: [email protected]; [email protected]). H. X. Li is with the Department of Systems Engineering and Engineering Management, City University of Hong Kong, Hong Kong, China (e-mail: [email protected]). and fuzzy c-means (FCM) algorithm [5] are two well known centralized clustering algorithms. Variants of these two algorithms are further discussed and popularized in [6-11]. However, with the continuous growth of data on distributed networks, the traditional centralized clustering methods have shown the weaknesses: (1) infeasibility of collecting data to a central site due to the privacy and security requirements or some technical constraints, such as energy consumption and bandwidth limitations in wireless sensor networks (WSNs) [12-13]; (2) the high computational complexity on huge data sets. Performing clustering in distributed environments, i.e., distributed clustering [14], is in dire need. Traditionally, there are two main architectures for distributed clustering: facilitator-worker and peer-to-peer (P2P) [15]. The former assumes that there is a central process unit (facilitator) which coordinates all data sites (workers). A related field that has significant overlap with this kind of distributed clustering is parallel clustering [16]. It usually follows the single problem multiple data (SPMD) parallelism with master-slave architecture [17]. The cluster prototype messages need to be exchanged between master and slave processes through message passing interface. Multiple rounds of communication need to be performed to achieve the globally consistent clusters across all data sites, as if the data from all sites are pooled into a central location for centralized clustering. More than one decade ago, many similar approaches have been proposed, and the scalable, high performance clustering solutions have been readily achieved [18-21]. However, in this kind of clustering method, it is assumed that the clusters are the same across all data sites at each iteration. A central control unit is required to be in charge of the collection, processing and handing out of the cluster prototype messages. In addition, some applications exist for which data sources are distributed over a large network containing no special central control, such as P2P network. The P2P distributed clustering approaches should be considered [14-15]. In a P2P network, each peer (data site) has equal functionality. A peer is a facilitator and a worker at the same time. A large number of peers are connected in an ad-hoc way where each peer can join and leave the network dynamically. Each peer can communicate with others according to the network structure. The P2P distributed clustering algorithms aim to achieve the locally optimized clusters at each peer, taking into consideration the local data in this peer and the necessary information (such as cluster prototypes or partition A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments Jin Zhou, Student Member, IEEE, C. L. Philip Chen, Fellow, IEEE, Long Chen, Member, IEEE, and Han-Xiong Li, Fellow, IEEE P

A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

  • View
    224

  • Download
    10

Embed Size (px)

Citation preview

Page 1: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

1

Abstract—Due to privacy and security requirements or

technical constraints, it is a difficult work for the traditional centralized approaches to perform data clustering in a large, dynamic distributed peer-to-peer network. In this paper, a novel collaborative fuzzy clustering algorithm is proposed, in which the centralized clustering solution is approximated by performing distributed clustering at each peer with the collaboration of other peers. The required communication links are established at the level of cluster prototype and attribute weight. The information exchange only exists between topological neighboring peers. The attribute-weight-entropy regularization technique is applied in the distributed clustering method to achieve ideal distribution of attribute weights, which ensures the good clustering results. And the important features are successfully extracted for the high-dimensions data clustering. The kernelization of the proposed algorithm is also realized as a practical tool for clustering the data with ‘non-spherical’ shaped clusters. Experiments on synthetic and real-world datasets have demonstrated the efficiency and superiority of the proposed algorithms.

Index Terms—Distributed peer-to-peer network, Collaborative clustering, Subspace clustering, Kernel-based clustering

I. INTRODUCTION ROTOTYPE-BASED partitioning clustering is an

essential machine learning technology using for data mining, pattern recognition and statistical analysis [1-3]. It partitions the data into clusters according to the similarities between objects and helps in extraction of new information or discovering new patterns. In the past few decades, the mainstream data clustering techniques are basically based on centralized operation, i.e., data sets are of small manageable sizes, and usually reside on one central site, then a single process performs clustering on the data. K-means algorithm [4]

This work was supported in part by the National 973 Basic Research

Program of China under Grant No.2011CB302801 and Macau Science and Technology Development Fund under Grant No.008/2010/A1.

J. Zhou is with the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China, and he is also with the School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China (e-mail: [email protected]).

C. L. Philip Chen and L. Chen are with the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China (e-mail: [email protected]; [email protected]).

H. X. Li is with the Department of Systems Engineering and Engineering Management, City University of Hong Kong, Hong Kong, China (e-mail: [email protected]).

and fuzzy c-means (FCM) algorithm [5] are two well known centralized clustering algorithms. Variants of these two algorithms are further discussed and popularized in [6-11]. However, with the continuous growth of data on distributed networks, the traditional centralized clustering methods have shown the weaknesses: (1) infeasibility of collecting data to a central site due to the privacy and security requirements or some technical constraints, such as energy consumption and bandwidth limitations in wireless sensor networks (WSNs) [12-13]; (2) the high computational complexity on huge data sets. Performing clustering in distributed environments, i.e., distributed clustering [14], is in dire need.

Traditionally, there are two main architectures for distributed clustering: facilitator-worker and peer-to-peer (P2P) [15]. The former assumes that there is a central process unit (facilitator) which coordinates all data sites (workers). A related field that has significant overlap with this kind of distributed clustering is parallel clustering [16]. It usually follows the single problem multiple data (SPMD) parallelism with master-slave architecture [17]. The cluster prototype messages need to be exchanged between master and slave processes through message passing interface. Multiple rounds of communication need to be performed to achieve the globally consistent clusters across all data sites, as if the data from all sites are pooled into a central location for centralized clustering. More than one decade ago, many similar approaches have been proposed, and the scalable, high performance clustering solutions have been readily achieved [18-21]. However, in this kind of clustering method, it is assumed that the clusters are the same across all data sites at each iteration. A central control unit is required to be in charge of the collection, processing and handing out of the cluster prototype messages. In addition, some applications exist for which data sources are distributed over a large network containing no special central control, such as P2P network. The P2P distributed clustering approaches should be considered [14-15].

In a P2P network, each peer (data site) has equal functionality. A peer is a facilitator and a worker at the same time. A large number of peers are connected in an ad-hoc way where each peer can join and leave the network dynamically. Each peer can communicate with others according to the network structure. The P2P distributed clustering algorithms aim to achieve the locally optimized clusters at each peer, taking into consideration the local data in this peer and the necessary information (such as cluster prototypes or partition

A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Jin Zhou, Student Member, IEEE, C. L. Philip Chen, Fellow, IEEE, Long Chen, Member, IEEE, and Han-Xiong Li, Fellow, IEEE

P

Page 2: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

2

matrices [12][22]) exchanged from others. In recent years, plenty of works pay more attention to this research. The algorithm P2P K-means, proposed by Datta et al. [23], is one of the first algorithms developed for P2P systems. Kashef et al. [24] present a distributed cooperative clustering method in a two-tier hierarchical P2P network, but the local super-peer is needed to aggregate the solutions from its ordinary peers. In [25], a good solution of distributed clustering in WSNs is proposed by capitalizing on the consensus-based formulation and parallel optimization tools. Recently, Pedrycz et al. [22] introduce the concept of collaborative fuzzy clustering where the summarized knowledge structures in different peers are shared by communicating information granules, while Coletta et al. [26] extend Pedrycz’s method to optimize the parameters including the interaction level for all pairs of peers and the number of clusters at each peer. But these collaborative approaches consider the fully connected network structures and show the limitation for the applications with large, dynamic network.

In this paper, we propose a novel collaborative clustering algorithm over a distributed P2P network. This algorithm searches the optimized clusters at each peer by collaborating with topologically neighboring peers only (exchanging the cluster prototype and attribute weight messages) step by step, till reaches the global consensus of all peers. Under the premise of clustering performance similar to that of centralized clustering methods, it reduces and evens the communication overhead among peers. The proposed algorithm can also conduct high-dimensions sparse data clustering and ‘non-spherical’ shaped data clustering, which are not considered by other distributed methods but widely used in some practical applications. For the high-dimensions sparse data clustering, the cluster structure in the dataset is often limited to a subset of features rather than the entire feature set. A better solution is to introduce the proper attribute weight into the clustering process according to the importance of different dimensions for cluster identification, which is referred to as soft subspace clustering [27]. Some fuzzy weighting subspace clustering [28-30] and entropy weighting subspace clustering [31-33] approaches have been proposed to address this issue. In our proposed collaborative clustering algorithm, the attribute-weight-entropy regularization technique is applied to achieve the ideal distribution of attribute weights, which is consistent with the available data. Thus the optimal clustering results are obtained and the important features are exacted for cluster identification. For the ‘non-spherical’ shaped data clustering, developments on kernel technique and their applications have emphasized the need to consider kernel method into the data clustering, which is referred to as kernel-based clustering [34-37]. Therefore, the kernelization of the proposed algorithm is realized for clustering the data with ‘non-spherical’ shaped clusters. In later experiments on synthetic and real-world datasets, the proposed algorithm and its kernelization have demonstrated good performance compared with other approaches.

The rest of the paper is organized as follows. The problem description and the collaborative distributed clustering

algorithm are presented in Section 2. The kernelization of the proposed algorithm is given in Section 3. The experiment results of different clustering algorithms on synthetic and real-world datasets are demonstrated in Section 4. Finally, conclusions are drawn in Section 5.

II. NOVEL COLLABORATIVE DISTRIBUTED CLUSTERING ALGORITHM

A. Preliminaries and Problem Description In our research, a distributed P2P network with J peers is

modeled as an undirected graph G. Each peer {1, 2, ..., }j J∈ is denoted as one node. The edge between two nodes shows the communication link of the corresponding peers. Each peer j is allowed to communicate only with its immediate neighbors

ji NB∈ , i.e., jNB is the neighbor set of peer j. The graph is assumed connected, indicating that there exists at least one multi-hop communication route between any two peers. The distributed P2P network is deployed to collect the objects and perform the clustering task. Each peer j consists of a set of Nj objects, that is { }X {1,2,..., }j jn jn N= ∈x . Each object

1 2[ , ,..., ]jn jn jn jnMx x x=x has M dimensions. We assume the same number of data clusters K for all peers. The clustering task of the distributed P2P network is to assign each object to one cluster k ( { }1, 2, ..., k K∈ ) based on a proper criterion chosen to quantify similarity among objects.

B. Collaborative Distributed Fuzzy C-Means Clustering Algorithm

With reference to our early research of attribute-weighted centralized clustering method [32], we present a novel collaborative distributed fuzzy c-means clustering (CDFCM) algorithm under the P2P network environment. The new objective function is developed as (1) by combining the distributed weighted dissimilarity measure and an extra term for the attribute-weight-entropy regularization.

2

1 1 1 1

1 1 1

min ( , , ) ( )

log

jNJ K M

jnk jkm jnm jkmj n k m

J K M

jkm jkmj k m

F u w x c

w w

α

γ

= = = =

= = =

= −

+

∑∑∑ ∑

∑∑∑

U C W (1)

Subject to , jkm ikm jc c i NB= ∈

1

1, 0 1K

jnk jnkk

u u=

= ≤ ≤∑

, jkm ikm jw w i NB= ∈

1

1, 0 1M

jkm jkmm

w w=

= ≤ ≤∑

where [ ]jnku=U is the membership degree matrix and jnku denotes the membership degree of the n-th object belonging to the k-th cluster in the j-th peer. [ ]jkmc=C is the cluster

Page 3: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

3

prototype matrix and jkmc denotes the m-th dimension of the

k-th cluster prototype in the j-th peer. [ ]jkmw=W is the

attribute weight matrix and jkmw denotes the m-th dimension of the k-th cluster weight vector in the j-th peer. α is the fuzzification coefficient and γ is a positive scalar.

In this new objective function, the first distributed weighted distance term controls the shape and size of the clusters and encourages the agglomeration of clusters, while the second term is the negative entropy of attribute weights that regularize the optimal distribution of all attribute weights according with the available data. So that we can simultaneously minimize the dispersion within clusters and maximize the entropy of attribute weights to stimulate important attributes to contribute more to the identification of clusters. γ ( 0γ > ) is a positive regularizing and adjustable parameter. With a proper choice of γ , we can balance the two terms to find the optimal solution. It is worth stressing that the consensus constraints jkm ikmc c= and

, jkm ikm jw w i NB= ∈ ensure that the local cluster prototypes and attribute weights yielded at each peer coincide with the global ones of all objects, i.e., our distributed clustering can yield the similar results to the ones obtained by the centralized clustering method.

Minimizing F(U, C, W) with respect to constraints is a constrained nonlinear optimization problem. Like the traditional FCM algorithm, Picard iteration is applied to solve this problem. We first fix C and W and find necessary conditions on U to minimize F(U). Then we fix W and U and minimize F(C) with respect to C. Finally, we fix U and C and minimize F(W) with respect to W. The matrices U, C and W are updated corresponding to the equations (2)-(6) respectively.

11

2

1

21

1

1

( )

( )

jnkM

jkm jnm jkmKmM

hjhm jnm jhm

m

u

w x c

w x c

α −

=

=

=

=⎛ ⎞

−⎜ ⎟⎜ ⎟⎜ ⎟−⎜ ⎟⎝ ⎠

∑∑

(2)

for 1 , 1 , 1jj J n N k K≤ ≤ ≤ ≤ ≤ ≤

1

1

j

j

j

N

jnk jkm jnm jikmn i NB

jkm N

jnk jkmn

u w x pc

u w

α

α

= ∈

=

=∑ ∑

∑ (3)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤

1( )jikm jikm jkm ikmp p c cη= + − (4)

for 1 , 1 , 1 , jj J k K m M i NB≤ ≤ ≤ ≤ ≤ ≤ ∈

1 2 1

1

1 2 1

1 1

exp ( ) 2

exp ( ) 2

j

j

j

j

N

jnk jnm jkm jikmn i NB

jkm NM

jnk jnl jkl jikll n i NB

u x c q

w

u x c q

α

α

γ γ

γ γ

− −

= ∈

− −

= = ∈

⎛ ⎞− − −⎜ ⎟⎜ ⎟

⎝ ⎠=⎛ ⎞

− − −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑

∑ ∑ ∑ (5)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤

2 ( )jikm jikm jkm ikmq q w wη= + − (6)

for 1 , 1 , 1 , jj J k K m M i NB≤ ≤ ≤ ≤ ≤ ≤ ∈

Here [ ]jikmp=P and [ ]jikmq=Q are two matrices containing the Lagrange multipliers corresponding to the consensus constraints jkm ikmc c= and , jkm ikm jw w i NB= ∈ . They are defined for the iterative update of the cluster prototypes and attribute weights. 1η and 2η are positive scalars.

The essence of collaborative clustering is to explore the structures of each peer through peer exchanges. There are two main phases, namely the clustering at individual peer and an interaction between neighbor peers by exchanging the findings. They intertwine and occur in a fixed sequence. A general view of the processing of the proposed collaborative distributed clustering algorithm is shown in Fig. 1.

Fig. 1. The block diagram of the overall methodology of the proposed collaborative distributed clustering algorithm.

Initially, each peer generates its initial cluster prototypes and attribute weights, and communicates its local findings to its neighbors. Then the FCM-type algorithm is performed independently at each peer with its optimization pursuits by focusing on the local data and the findings communicated by neighbor peers at this point of time. After one step iteration, all peers are ready to engage in another communication phase. Again they communicate the findings and set up new conditions for the new phase of the FCM-type clustering. The pair of clustering and communication processes is referred to as the collaboration. The overall optimization takes a finite number of collaboration iterations, which terminates once there is no further significant improvement in the revealed structure (cluster prototypes and attribute weights) of all peers.

Algorithm 1. CDFCM Clustering Algorithm

Input: The number of peers J, the number of objects generated by the j-th peer Nj, the number of data dimensions M, the number of clusters K, parameter α , γ , 1η and 2η .

Initialization

Communication

No

Yes

Terminate? Clustering

(One step iteration)

End

Page 4: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

4

Output: Accuracy of the clustering. For each peer j ( {1, 2, ..., }j J∈ ), randomly generate K initial cluster prototypes C(0), set the value of initial attribute weight to 1/M, set (0) 0jikmp = and (0) 0jikmq = for ji NB∈ ,

1 k K≤ ≤ , 1 m M≤ ≤ , broadcast its initial cluster prototypes and attribute weights to its neighboring peers, set iteration index t=0; Repeat

D: Update the partition matrix U(t+1) by (2) at each peer; D: Update the cluster prototypes C(t+1) by (3) at each peer; Each peer broadcasts the updated cluster prototypes to all its

neighboring peers; D: Update the multipliers P(t+1) by (4) at each peer; D: Update the attribute weights W(t+1) by (5) at each peer; Each peer broadcasts the updated attribute weights to all its

neighboring peers; D: Update the multipliers Q(t+1) by (6) at each peer; t++;

Until termination condition of the collaboration activities has been satisfied.

In this algorithm, the iterative steps with “D” marks are performed in distributive mode. Each peer updates its cluster memberships, cluster prototypes, and attribute weights respectively based on its local information and the findings coming from its neighbors. When the variation of cluster prototypes and attribute weights in two consecutive iterations is smaller than a preset threshold, the peer will send the “convergence” message to its neighbors. The iteration of algorithm of each peer will terminate if the peer has received the notifications of “convergence” from all its neighbors. Then the agreement on cluster prototypes and attribute weights of all peers is achieved. Under the premise of clustering performance coinciding with that of centralized clustering methods, CDFCM algorithm reduces and evens the data transmission energy consumption among peers. In the ensuing sub-sections, we give the proofs of iterative optimization analysis and consistency analysis.

C. Iterative Optimization Analysis In this sub-section, three updating equations (2), (3), and (5)

used for Picard iteration are proved by the following theorems. Theorem 1. Let C and W be fixed, F(U) is locally minimized

if U is given by (2). Proof. If C and W are fixed, we use the Lagrangian

multiplier technique to reformulate min F into the following unconstrained minimization problem with respect to U.

2

1 1 1 1

1 1 1

1 1 1

min ( , ) ( )

log

( 1)

j

j

NJ K M

jnk jkm jnm jkmj n k m

J K M

jkm jkmj k m

NJ K

jn jnkj n k

G u w x c

w w

u

α

γ

λ

= = = =

= = =

= = =

= −

+

− −

∑∑∑ ∑

∑∑∑

∑∑ ∑

U Λ

(7)

where [ ]jnλ=Λ is the Lagrange multiplier matrix

corresponding to the constraint 1

1K

jnkk

u=

=∑ .

By setting the gradient of ( , )G U Λ to zero with respect to

jnλ and jnku , we obtain

1

( , ) ( 1) 0K

jnkkjn

G uλ =

∂= − − =

∂ ∑U Λ (8)

1 2

1

( , ) ( ) 0M

jnk jkm jnm jkm jnmjnk

G u w x cu

αα λ−

=

∂= − − =

∂ ∑U Λ (9)

for 1 , 1 , 1jj J n N k K≤ ≤ ≤ ≤ ≤ ≤ From (8) and (9), we have (2). This completes the proof. Theorem 2. Let U and W be fixed, F(C) is locally minimized

if C is given via (3) and (4). Proof. If U and W are fixed, the Lagrangian multiplier

technique is used to obtain the following unconstrained minimization problem with respect to C.

2

1 1 1 1

1 1 1

1 1 1

min ( , ) ( )

log

( )

j

j

NJ K M

jnk jkm jnm jkmj n k m

J K M

jkm jkmj k m

J K M

jikm jkm ikmj i NB k m

G u w x c

w w

p c c

α

γ

= = = =

= = =

= ∈ = =

= −

+

+ −

∑∑∑ ∑

∑∑∑

∑ ∑ ∑∑

C P

(10)

where [ ]jikmp=P is the Lagrange multiplier matrix

corresponding to the consensus constraint , jkm ikm jc c i NB= ∈ .

By setting the gradient of ( , )G C P to zero with respect to

jkmc , we obtain

1

( , ) 2 ( )

0

j

j j

N

jnk jkm jnm jkm jikm ijkmn i NB i NBjkm

G u w x c p pc

α

= ∈ ∈

∂= − − + −

=

∑ ∑ ∑C P(11)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤ From (11), we have

1

1

1 ( )2

j

j

j

N

jnk jkm jnm jikm ijkmn i NB

jkm N

jnk jkmn

u w x p pc

u w

α

α

= ∈

=

− −

=∑ ∑

∑ (12)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤ It is worth to point out that the update of cluster prototypes is

followed by a gradient ascent step over the multipliers jikmp as

1( )jikm jikm jkm ikmp p c cη= + − for 1 , 1 ,j J k K≤ ≤ ≤ ≤

1 , jm M i NB≤ ≤ ∈ , where 1η is positive scalar. If (0)jikmp

and (0)ijkmp are initialized to zero, we have ( ) ( )jikm ijkmp t p t= −

for 0t∀ > , where t denotes the iteration index. Substituting this into (12), we obtain (3). This completes the proof.

Page 5: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

5

Theorem 3. Let U and C be fixed, F(W) is locally minimized if W is given by (5) and (6).

Proof. If U and C are fixed, the Lagrangian multiplier technique is used to obtain the following unconstrained minimization problem with respect to W.

2

1 1 1 1

1 1 1

1 1 1

min ( , , ) ( )

log

( 1)

jNJ K M

jnk jkm jnm jkmj n k m

J K M

jkm jkmj k m

J K M

jk jkmj k m

G u w x c

w w

w

α

γ

λ

= = = =

= = =

= = =

= −

+

− −

∑∑∑ ∑

∑∑∑

∑∑ ∑

W Λ Q

1 1 1 ( )

j

J K M

jikm jkm ikmj i NB k m

q w w= ∈ = =

+ −∑ ∑ ∑∑

(13)

where [ ]jkλ=Λ is the Lagrange multiplier matrix

corresponding to the constraint 1

1M

jkmm

w=

=∑ , and [ ]jikmq=Q is

the Lagrange multiplier matrix corresponding to the consensus constraint , jkm ikm jw w i NB= ∈ .

By setting the gradient of ( , )G C P to zero with respect to

jkλ and jkmc , we obtain

1

( , , ) ( 1) 0M

jkmmjk

G wλ =

∂= − − =

∂ ∑W Λ Q (14)

2

1

( , , ) ( ) (log 1)

0

j

j j

N

jnk jnm jkm jkmnjkm

jk jikm ijkmi NB i NB

G u x c ww

q q

α γ

λ=

∈ ∈

∂= − + +

− + −

=

∑ ∑

W Λ Q

(15)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤ From (15), we have

1 2 1

11

1 2 1

1

( ) 1exp

( )

exp ( ) ( )

j

j j

j

j j

N

jnk jnm jkm jkn

jkm

jikm ijkmi NB i NB

N

jnk jnm jkm jikm ijkmn i NB i NB

u x cw

q q

u x c q q

α

α

γ γ λ

γ

γ γ

− −

=

∈ ∈

− −

= ∈ ∈

⎛ ⎞− − − +⎜ ⎟

⎜ ⎟=⎜ ⎟− −⎜ ⎟⎝ ⎠⎛ ⎞

= − − − −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑

∑ ∑ ∑

( )1 exp 1 jkγ λ−⋅ − +

(16)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤ Substituting (16) into (14), we have

( )1

1

1 2 1

1 1

exp 1

exp ( ) ( )

1

j

j j

M

jkll

jk

NM

jnk jnl jkl jikl ijkll n i NB i NB

w

u x c q qα

γ λ

γ γ

=

− −

= = ∈ ∈

= − +

⎛ ⎞⋅ − − − −⎜ ⎟⎜ ⎟

⎝ ⎠=

∑ ∑ ∑ ∑(17)

It follows that

( )1

1 2 1

1 1

exp 1

1

exp ( ) ( )j

j j

jk

NM

jnk jnl jkl jikl ijkll n i NB i NB

u x c q qα

γ λ

γ γ

− −

= = ∈ ∈

− +

=⎛ ⎞

− − − −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑ ∑ ∑(18)

Substituting (18) into (16), we obtain

1 2 1

1

1 2 1

1 1

exp ( ) ( )

exp ( ) ( )

j

j

j

j

N

jnk jnm jkm jikm ijkmn i NB

jkm NM

jnk jnl jkl jikl ijkll n i NB

u x c q qw

u x c q q

α

α

γ γ

γ γ

− −

= ∈

− −

= = ∈

⎛ ⎞− − − −⎜ ⎟⎜ ⎟

⎝ ⎠=⎛ ⎞

− − − −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑

∑ ∑ ∑ (19)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤ Similar to the cluster prototypes, the update of attribute

weights is followed by a gradient ascent step over the multipliers jikmq as 2 ( )jikm jikm jkm ikmq q w wη= + − for

1 , j J≤ ≤ 1 , 1 , jk K m M i NB≤ ≤ ≤ ≤ ∈ , where 2η is

positive scalar. If (0)jikmq and (0)ijkmq are initialized to zero,

we have ( ) ( )jikm ijkmq t q t= − for 0t∀ > , where t denotes the iteration index. Substituting this into (19), we obtain (5). This completes the proof.

D. Consistency Analysis In this sub-section, we give the proofs that the distributed

clustering solution obtained by CDFCM algorithm coincides with that of the centralized clustering method, including the cluster prototypes and the attribute weight assignment.

Theorem 4. The cluster prototypes obtained by the CDFCM algorithm are consistent with that of the centralized clustering method.

Proof. From (3), we have

1 1

j j

j

N N

jnk jkm jkm jnk jkm jnm jikmn n i NB

u w c u w x pα α

= = ∈

= −∑ ∑ ∑ (20)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤ Then we have

1 1 1 1 1

j j

j

N NJ J J

jnk jkm jkm jnk jkm jnm jikmj n j n j i NB

u w c u w x pα α

= = = = = ∈

= −∑∑ ∑∑ ∑ ∑ (21)

for 1 , 1k K m M≤ ≤ ≤ ≤ Let 1 2[ , ,..., ]kk k k kMc c c′ ′ ′ ′=c and 1 2[ , ,..., ]k k k kMw w w′ ′ ′ ′=w for

1 k K≤ ≤ be the consensus cluster prototypes and attribute weights reached by the CDFCM algorithm, from (21), we have

1 1 1 1 1

j j

j

N NJ J J

km km jnk km jnk jnm jikmj n j n j i NB

c w u w u x pα α

= = = = = ∈

′ ′ ′= −∑∑ ∑∑ ∑ ∑ (22)

for 1 , 1k K m M≤ ≤ ≤ ≤ It follows that

1 1 1

1 1

j

j

j

NJ J

km jnk jnm jikmj n j i B

km NJ

km jnkj n

w u x pc

w u

α

α

= = = ∈

= =

′ −′ =

∑∑ ∑ ∑

∑∑ (23)

Page 6: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

6

for 1 , 1k K m M≤ ≤ ≤ ≤ Let (0)jikmp and (0)ijkmp be initialized to zero, from (4), we

have ( ) ( )jikm ijkmp t p t= − for 0t∀ > , where t denotes the iteration index. Then we have

10

j

J

jikmj i NB

p= ∈

=∑ ∑ (24)

for 1 , 1k K m M≤ ≤ ≤ ≤ Substituting (24) into (23), we obtain

1 1

1 1

j

j

NJ

jnk jnmj n

km NJ

jnkj n

u xc

u

α

α

= =

= =

′ =∑∑

∑∑ (25)

for 1 , 1k K m M≤ ≤ ≤ ≤ If all objects are collected into one central unit from all peers,

it can be seen that these cluster prototypes in (25) are consistent with the centralized ones as (26) obtained by the entropy weighting k-means (EWKM) algorithm [31] and its fuzzification named weighted entropy-regularized fuzzy c-means (WEFCM) algorithm [32], which are two attribute-weighted centralized clustering methods. This completes the proof.

1

1

N

nk nmn

km N

nkn

u xc

u

α

α

=

=

=∑

∑ (26)

where N is the total number of all objects.

Theorem 5. The attribute weights obtained by the CDFCM algorithm are consistent with that of the centralized clustering method.

Proof. Let 1 2[ , ,..., ]k k k kMw w w′ ′ ′ ′=w for 1 k K≤ ≤ be the consensus attribute weights reached by CDFCM, from (5), we have

1

1 2 1

1 1

1 2 1

1 1 1

( )

exp ( ) 2

exp ( ) 2

j

j

j

j

JJ

jkm kmj

NJ

jnk jnm jkm jikmj n i NB

NJ M

jnk jnl jkl jiklj l n i NB

w w

u x c q

u x c q

α

α

γ γ

γ γ

=

− −

= = ∈

− −

= = = ∈

′ ′=

⎛ ⎞− − −⎜ ⎟⎜ ⎟

⎝ ⎠=⎛ ⎞

− − −⎜ ⎟⎜ ⎟⎝ ⎠

∏ ∑ ∑

∏∑ ∑ ∑

(27)

for 1 , 1k K m M≤ ≤ ≤ ≤ It follows that

1 2 1

1 1 1

1 2 1

1 11

exp ( ) 2

exp ( ) 2

j

j

j

j

NJ J

J jnk jnm jkm jikmj n j i NB

kmNJ M

J jnk jnl jkl jikll n i NBj

u x c q

w

u x c q

α

α

γ γ

γ γ

− −

= = = ∈

− −

= = ∈=

⎛ ⎞− − −⎜ ⎟⎜ ⎟

⎝ ⎠′ =⎛ ⎞

− − −⎜ ⎟⎜ ⎟⎝ ⎠

∑∑ ∑ ∑

∑ ∑ ∑∏(28)

for 1 , 1k K m M≤ ≤ ≤ ≤

According to the constraint of attribute weights, we have

1

1M

kmm

w=

′ =∑ (29)

for 1 k K≤ ≤ Then we have

1 2 1

1 1 1 1

1 2 1

1 11

exp ( ) 2

1

exp ( ) 2

j

j

j

j

NM J J

J jnk jnl jkl jikll j n j i NB

NJ M

J jnk jnl jkl jikll n i NBj

u x c q

u x c q

α

α

γ γ

γ γ

− −

= = = = ∈

− −

= = ∈=

⎛ ⎞− − −⎜ ⎟⎜ ⎟

⎝ ⎠ =⎛ ⎞

− − −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑∑ ∑ ∑

∑ ∑ ∑∏ (30)

From (28) and (30), we obtain

1 2 1

1 1 1

1 2 1

1 1 1 1

exp ( ) 2

exp ( ) 2

j

j

j

j

NJ J

J jnk jnm jkm jikmj n j i NB

kmNM J J

J jnk jnl jkl jikll j n j i NB

u x c q

w

u x c q

α

α

γ γ

γ γ

− −

= = = ∈

− −

= = = = ∈

⎛ ⎞− − −⎜ ⎟⎜ ⎟

⎝ ⎠′ =⎛ ⎞

− − −⎜ ⎟⎜ ⎟⎝ ⎠

∑∑ ∑ ∑

∑ ∑∑ ∑ ∑(31)

Let (0)jikmq and (0)ijkmq be initialized to zero, from (6), we

have ( ) ( )jikm ijkmq t q t= − for 0t∀ > , where t denotes the iteration index. Then we have

10

j

J

jikmj i NB

q= ∈

=∑ ∑ (32)

for 1 , 1k K m M≤ ≤ ≤ ≤ Substituting (32) into (31), we obtain

1 2

1 1

1 2

1 1 1

exp ( )

exp ( )

j

j

NJ

J jnk jnm jkmj n

km NM J

J jnk jnl jkll j n

u x c

w

u x c

α

α

γ

γ

= =

= = =

⎛ ⎞− −⎜ ⎟⎜ ⎟

⎝ ⎠′ =⎛ ⎞

− −⎜ ⎟⎜ ⎟⎝ ⎠

∑∑

∑ ∑∑ (33)

for 1 , 1k K m M≤ ≤ ≤ ≤ Similar to the consistence proof of cluster prototypes, by

comparing the (33) with the attribute weights (34) obtained by the centralized clustering method, EWKM and WEFCM algorithms [31-32], we know that (34) is a special case of (33) in which all the data are collected in one node. This completes the proof.

2

1

2

1 1

exp( ( ) )

exp( ( ) )

N

nk nm kmn

km M N

nk nl kll n

u x cw

u x c

α

α

γ

γ

=

= =

− −=

− −

∑ ∑ (34)

where N is the total number of all objects.

III. KERNELIZATION OF THE CDFCM ALGORITHM

Since FCM uses the squared-norm to evaluate similarity between objects and prototypes, it can only be helpful in clustering the data with ‘spherical’ clusters. For the data with ‘non-spherical’ clusters, the idea of performing clustering in high-dimensional feature space with mercer kernel based mapping can be considered [34]. The essence of kernel method is to perform a non-linear mapping Φ from the original d-dimensional space Rd to a high-dimensional kernel space H [38]. Then the linear classifier in the kernel space can be used to solve the clustering problem which could be highly non-linear

Page 7: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

7

in the original feature space. Recently, the kernel method has been widely applied to fuzzy clustering, which is referred to as kernel-based fuzzy clustering [34-37]. These approaches can be divided into two major types [36]. The first one comes with prototypes constructed in the original feature space and is referred to as KFCM-F, while another confines the prototypes in the kernel space and is referred to as KFCM-K [36]. Considering the need of exchanging the cluster prototype and attribute weight messages in the original feature space, we focus on the first type of kernel method and propose the kernel-based CDFCM (KCDFCM) algorithm. The new objective function is defined as (35).

2

1 1 1 1

1 1 1

min ( , , ) ( ) ( )

log

jNJ K M

jnk jkm jnm jkmj n k m

J K M

jkm jkmj k m

F u w x c

w w

α Φ Φ

γ

= = = =

= = =

= −

+

∑∑∑ ∑

∑∑∑

U C W(35)

Subject to , jkm ikm jc c i NB= ∈

1

1, 0 1K

jnk jnkk

u u=

= ≤ ≤∑

, jkm ikm jw w i NB= ∈

1

1, 0 1M

jkm jkmm

w w=

= ≤ ≤∑

where [ ]jnku=U , [ ]jkmc=C and [ ]jkmw=W are membership degree matrix, cluster prototype matrix and attribute weight matrix respectively, same as defined in (1). Φ is the non-linear mapping from the original feature space to the kernel space. α is the fuzzification coefficient and γ is a positive scalar.

Kernel method takes advantage of the fact that dot products in the kernel space, which can be expressed as a Mercer kernel K given by T( , ) ( ) ( )K Φ Φ=x y x y where , d∈x y R . Proverbially used Mercer kernels include Gaussian kernel, Polynomial kernel, and so on [38]. In this paper, Gaussian

kernel (2

22( , ) exp , 0K σ

σ

⎛ ⎞− −⎜ ⎟= >⎜ ⎟⎝ ⎠

x yx y ) is considered.

According to the definition of the kernel function, we have 2

T

T T T

2

2

( ) ( )

( ( ) ( )) ( ( ) ( ))

( ) ( ) ( ) ( ) 2 ( ) ( )

( , ) ( , ) 2 ( , )

2 2exp( )

jnm jkm

jnm jkm jnm jkm

jnm jnm jkm jkm jnm jkm

jnm jnm jkm jkm jnm jkm

jnm jkm

x c

x c x c

x x c c x c

K x x K c c K x c

x c

Φ Φ

Φ Φ Φ Φ

Φ Φ Φ Φ Φ Φ

σ

= − −

= + −

= + −

− −= −

(36)

Picard iteration is also applied to solve this optimization problem (35). The matrices U, C and W are updated corresponding to the equations (37)-(41) respectively. The iterative optimization process is same as the CDFCM algorithm.

112

1

21

1

1

( ) ( )

( ) ( )

jnkM

jkm jnm jkmKmM

hjhm jnm jhm

m

u

w x c

w x c

αΦ Φ

Φ Φ

=

=

=

=⎛ ⎞−⎜ ⎟⎜ ⎟⎜ ⎟−⎜ ⎟⎝ ⎠

∑∑

(37)

for 1 , 1 , 1jj J n N k K≤ ≤ ≤ ≤ ≤ ≤ 2

2

21

2

21

exp2

exp

j

j

j

Njnm jkm

jnk jkm jnm jikmn i NB

jkmN

jnm jkmjnk jkm

n

x cu w x p

cx c

u w

α

α

σσ

σ

= ∈

=

⎛ ⎞− −⎜ ⎟ −⎜ ⎟⎝ ⎠=

⎛ ⎞− −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑

∑(38)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤

1( )jikm jikm jkm ikmp p c cη= + − (39)

for 1 , 1 , 1 , jj J k K m M i NB≤ ≤ ≤ ≤ ≤ ≤ ∈

21 1

1

21 1

1 1

exp ( ) ( ) 2

exp ( ) ( ) 2

j

j

j

j

N

jnk jnm jkm jikmn i NB

jkm NM

jnk jnl jkl jikll n i NB

u x c qw

u x c q

α

α

γ Φ Φ γ

γ Φ Φ γ

− −

= ∈

− −

= = ∈

⎛ ⎞− − −⎜ ⎟⎜ ⎟

⎝ ⎠=⎛ ⎞

− − −⎜ ⎟⎜ ⎟⎝ ⎠

∑ ∑

∑ ∑ ∑(40)

for 1 , 1 , 1j J k K m M≤ ≤ ≤ ≤ ≤ ≤

2 ( )jikm jikm jkm ikmq q w wη= + − (41)

for 1 , 1 , 1 , jj J k K m M i NB≤ ≤ ≤ ≤ ≤ ≤ ∈

Here [ ]jikmp=P and [ ]jikmq=Q are two Lagrange

multiplier matrices. 1η and 2η are positive scalars.

IV. EXPERIMENTS To evaluate the performance of the proposed algorithms

(CDFCM and KCDFCM), several FCM-type clustering algorithms are chosen for comparative analysis, including the centralized clustering methods (FCM [5] and WEFCM [32]), the kernel-based clustering method (KFCM-F [36]), the parallel clustering method (PFCM [20]), and the distributed clustering method (Soft-DKM [25]). All the clustering algorithms are implemented with C++ language on a computer with 3.4GHz CPU and 4GB RAM. Table 1 lists the parameter settings of seven FCM-type clustering algorithms. The fuzzification coefficient α is an important parameter that affects the performance of the clustering and the final shapes of clusters. When the coefficient is close to 1, the fuzzy c-means clustering will looks more like a hard k-means algorithm. In order to keep the fuzziness of the clustering, the value of coefficient is usually selected in the range of [1.1-5.0]. In this paper, we use the most broadly selected value, 2. A series of experiments are performed with various datasets including synthetic and real world data. Each algorithm is executed on each dataset for 100 times, and the cluster prototypes are randomly initialized at each time. Since the range of values of raw data varies widely, in the clustering algorithms, objective

Page 8: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

8

functions will not work properly without normalization. For example, the majority of classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be dominated by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance. In this paper, we adopt a simple normalization method, which rescale the range of features to make the features independent of each other and aim to scale the range in [0, 1] as (44).

*ij j

ijj j

x xx

x x

−′ =

− (42)

where xij is an original value, ijx′ is the normalized value, * max{ }, min{ }j ij j ijii

x x x x−= = .

TABLE I PARAMETER SETTING OF SEVEN FCM-TYPE CLUSTERING ALGORITHMS

Algorithm Parameter setting FCM 2α =

WEFCM 2α = ; γ is varied from 0.01 to 1 KFCM-F 2α = ; 2σ is varied from 2-10 to 25

PFCM 2α = Soft-DKM 2α = ; η is varied from 10-8 to 100

CDFCM 2α = ; γ is varied from 0.01 to 1; 1η is varied from 10-8 to 100; 2η is varied from 10-8 to 100

KCDFCM 2α = ; 2σ is varied from 2-10 to 25; γ is varied from 0.01 to 1; 1η is varied from 10-8 to 100; 2η is varied from 10-8 to 100

A. Performance Metrics For the purpose of comparative analysis, four kinds of

performance metrics are applied in the experiments. • Iteration Number (IN): The iteration number is a

common measure used to indicate the speed of convergence of the clustering algorithm. For 100 executions, the average iteration number (AIN) is used in the experiments.

• Classification Rate (CR): The classification rate (the-larger-the-better) is a measure used to determine how well the clustering algorithm performs on the given dataset with a known cluster structure [36]. It can be measured by (43), which is expressed as a percentage in this paper. For 100 executions, the average classification rate (ACR) is used in the experiments.

1

K

kk

dCR

N==

∑ (43)

where dk is the number of objects correctly identified in the k-th cluster, and N is the total number of objects in the dataset.

• Normalized Mutual Information (NMI): The normalized mutual information (the-larger-the-better) provides a symmetric measure to quantify the statistical information shared between two cluster distributions [39]. For 100 executions, the average normalized mutual information (ANMI) is used in the experiments.

1 1

( , )( , ) log( ) ( )

( , )( ) ( )

I J

i j

P i jP i jP i P j

NMI R QH R H Q

= ==∑∑

(44)

where R, Q are two partitions of the dataset. Assume R and Q have I and J clusters respectively. P(i) is the probability that a randomly selected object from the dataset falls into cluster Ri in partition R. P(i, j) denotes the probability that an object belongs to cluster Ri in R and cluster Qj in Q. H(R) is the entropy associated with all probabilities P(i) (1 i I≤ ≤ ) in partition R.

• Transmission Energy Consumption (TEC): It denotes the energy consumption level of each sensor for data transmission in WSNs and gives an indication of the network state. In the experiments, we adopt the energy model introduced in [40-41]. For 100 times, the average transmission energy consumption (ATEC) and the maximum transmission energy consumption (MTEC) are calculated. They are the-smaller-the-better indices. If a data packet with k-bit message is transmitted a distance d (m) from node Ni to node Nj, for node Ni, the energy consumed is

2( , )iN elec ampE k d E k E k d= ⋅ + ⋅ ⋅ (45)

and for node Nj, the energy consumed is ( )

jN elecE k E k= ⋅ (46)

where 250 / , 100 / /elec ampE nJ bit E pJ bit m= = .

B. WSN-based Synthetic Datasets This section works on a WSN in which sensor nodes are

randomly distributed over a 200 200m m× region. The communication range of each sensor is set to 40m, i.e., each sensor will only exchange information with the neighbors in the communication range. Fig. 2 illustrates a sample of such kind of WSN that contains 50 sensor nodes.

Fig. 2. The random distribution of 50 sensor nodes in a WSN.

The WSN is deployed to collect the monitoring data and perform the clustering task. Assume each sensor node collect 150 objects belonging to 3 clusters (K=3). Each object has 6 attributes (M=6). Three synthetic datasets are created in the WSN for our experiments, in which S1 collects 4500 data using 30 sensors nodes (N=4500, J=30), S2 collects 7500 data using 50 sensor nodes (N=7500, J=50), and S3 collects 10500 data

Page 9: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

9

with 70 sensor nodes (N=10500, J=70). The dataset generation algorithm is summarized in the appendix. According to this algorithm, the data dispersion is mainly controlled by the parameters kmτ ( [ ], 1 3, 1 6)km k mτ= ≤ ≤ ≤ ≤T . The large kmτ indicates the data is more agglomerate in the m-th dimension of the k-th cluster, and vice versa. To study the impact of each data dimension’s agglomeration degree on the clustering,

4 1 1 1 1 1[ ] 1 4 1 1 1 1

1 1 4 1 1 1kmτ

⎡ ⎤⎢ ⎥= = ⎢ ⎥⎢ ⎥⎣ ⎦

Τ is assigned with respect to 6

data dimensions of 3 clusters. Fig. 3 illustrates the distribution of different dimensions/attributes of the dataset S2 with 50 sensor nodes. Because 5kτ and 6kτ are equal to 4kτ , the distributions of attribute-5 and attribute-6 are similar to the one of attribute-4, we only present the distribution of the first four attributes.

(a) (b)

(c) (d)

Fig. 3. Distribution of the first four attributes of the WSN-based Synthetic dataset with 50 sensor nodes. (a) Attribute-1. (b) Attribute-2. (c) Attribute-3. (d) Attribute-4.

TABLE II STATISTICS OF DIFFERENT CLUSTERING ALGORITHMS ON THE WSN-BASED

SYNTHETIC DATASETS IN TERMS OF AIN, ACR(%), ANMI, ATEC(10-2J), AND AMEC(10-2J)

Data set FCM WE

FCM KFCM

-F PFCM Soft -DKM

CD FCM

KCD FCM

S1

*20.2 90.31 0.7104 4.3509 15.5182

21.8 97.24 0.8864 4.3509 15.5182

25.4 91.64 0.7195 4.3509 15.5182

38.2 90.31 0.7104 2.7517 8.2979

31.4 88.72 0.7065 *0.7094 *0.8681

38.4 96.88 0.8798 1.7351 2.1234

50.4 *97.54 *0.8912 2.2773 2.7869

S2

*24.2 90.62 0.7205 4.1836 22.9313

26.6 *97.16 *0.8885 4.1836 22.9313

28.6 91.72 0.7268 4.1836 22.9313

40.2 90.62 0.7205 2.9069 9.7324

35.4 89.70 0.7198 *1.0182 *1.3186

41.8 96.94 0.8826 2.4045 3.1139

53.2 97.00 0.8842 3.0602 3.9632

S3

25.0 90.33 0.7107 4.3849 17.2958

28.4 *97.02 *0.8879 4.3849 17.2958

*24.6 91.65 0.7193 4.3849 17.2958

38.4 90.33 0.7107 2.7391 9.3928

32.8 90.33 0.7107 *1.1003 *1.4422

34.6 96.25 0.8832 2.3213 3.0426

40.4 96.81 0.8844 2.7104 3.5526

* The best performance among the group.

Table 2 lists the clustering results of different clustering

algorithms on three WSN-based synthetic datasets S1, S2 and S3. Because the results from different datasets are similar, without loss of generality, we take dataset S2 as the example to analyze and have following observations.

The first observation is that the consideration of attribute-weight-entropy regularization technique in clustering makes our algorithms obtain much better clustering performance in the average classification rate (ACR) and the average normalized mutual information (ANMI). In Table 2, WEFCM, CDFCM and KCDFCM show better ACR and ANMI than FCM, KFCM-F, PFCM, and Soft-DKM. For the attribute-weighted clustering algorithms WEFCM, CDFCM, and KCDFCM, their performance in ACR and ANMI are very close. But without any surprise, the proposed distributed clustering algorithm CDFCM and its kernelization KCDFCM need more iterations (with AIN of 41.8 and 53.2 respectively) to reach the convergence than the centralized clustering approach WEFCM (with AIN of 26.6). This is because the information exchange between different sensors extends the convergence time of distributed clustering approaches. Table 3 lists the final attribute weight assignments of different attribute-weighted clustering algorithms (WEFCM, CDFCM, and KCDFCM). In order to have a more intuitive understanding of the attribute weight assignment, we further investigate the distribution of different dimensions/attributes of the dataset S2 in Fig. 3. According to the parameter setting ( [ ] , kmτ=T 1 3, 1 6)k m≤ ≤ ≤ ≤ , we know that in cluster-1 the values of attribute-1 ( 11τ =4) are more compact than the ones of other attributes. So attribute-1 should be more important and contribute much more than other attributes when differing cluster-1 from cluster-2 and cluster-3. As a result, the high weight should be assigned to attribute-1 in cluster-1. This can be verified by Table 3, in which the weight of attribute-1 is much higher than others in cluster-1 for each attribute-weighted clustering algorithm. The similar situation occurs for attribute-2 and attribute-3. All these prove the efficiency of the attribute-weight-entropy regularization technique for data

TABLE III ATTRIBUTE WEIGHT OBTAINED BY THREE ATTRIBUTE-WEIGHTED CLUSTERING

ALGORITHMS ON THE WSN-BASED SYNTHETIC DATASET WITH 50 SENSORS WEFCM

Attribute1 Attribute2 Attribute3 Attribute4 Attribute5 Attribute6

Cluster1 0.4308 0.1213 0.0655 0.1482 0.1379 0.0964

Cluster2 0.1096 0.3309 0.0954 0.1751 0.1575 0.1315

Cluster3 0.0859 0.1273 0.3724 0.1477 0.1406 0.1263

CDFCM

Attribute1 Attribute2 Attribute3 Attribute4 Attribute5 Attribute6

Cluster1 0.3629 0.1409 0.0727 0.1443 0.1691 0.1101

Cluster2 0.0927 0.3010 0.0841 0.2180 0.1818 0.1224

Cluster3 0.0899 0.1719 0.3425 0.1229 0.1147 0.1581

KCDFCM

Attribute1 Attribute2 Attribute3 Attribute4 Attribute5 Attribute6

Cluster1 0.4009 0.1279 0.0788 0.1461 0.1413 0.1049

Cluster2 0.0723 0.3370 0.1073 0.2027 0.1151 0.1656

Cluster3 0.0797 0.1386 0.3503 0.1818 0.1247 0.1249

Another observation from Table 3 is that the three distributed

Page 10: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

10

clustering algorithms Soft-DKM, CDFCM, and KCDFCM have less average transmission energy consumption (ATEC) than the traditional centralized parallel clustering methods like FCM, WEFCM, KFCM-F, and PFCM. This is reasonable because distributed clustering algorithms only exchange a small quantity of data like cluster prototypes and attribute weights, but the centralized and parallel clustering approaches require moving all the data in one center node. More importantly, the three distributed algorithms yield much more balanced transmission energy consumptions among sensors (AMEC), which leads to longer network lifetime (the time elapsed until the first sensor node in WSNs depletes its energy). All of these prove the superiority of our algorithms for data clustering in distributed energy-efficient network applications, comparing with the traditional centralized and parallel clustering methods.

When comparing the proposed two algorithms with another distributed clustering algorithm Soft-DKM, we can observe that the Soft-DKM algorithm needs to consume only about half of the transmission energy (ATEC and AMEC) spent by our attribute-weighted distributed clustering approaches. But such kind of saving in transmission energy is achieved at the cost of greatly deteriorated clustering performance (ACR and ANMI). Considering most of time the first target of clustering is still the high classification rate, the great classification performance improvement bought by the proposed attribute-weighted distributed clustering algorithms at the expense of little more energy consumption/communication overheads is acceptable. In this sense, the proposed approaches are still preferable than Soft-DKM.

The last interesting finding is that in this example, the kernel-based algorithms do not provide significant improvements in ACR and ANMI as compared to corresponding traditional approaches. This can be explained by the spherical shapes of generated data in our experiments. For these datasets, the capability of kernel-based clustering in separating ‘non-spherical’ data has not shown distinctive advantages consequently.

As a short summary of the observations above, we can specify the scenario where the proposed algorithms could best fit for. Basically, when the centralized clustering approaches are discouraged by the technical constrains like the volume size of data and the privacy and security problems like no fully data transmitting permit, the collaborative clustering approaches proposed in this paper are the good selection unless the transmission energy consumption is an extremely important consideration. Even in such situations, the proposed approaches are still viable because we can set all the attribute weights to 1 and stop transfer attribute weights between sensors. Then the proposed approaches are degenerate into a simple collaborative fuzzy clustering without attribute weights and will consume much less energy as the one spent in Soft-DKM. As to the selection between the CDFCM algorithm and its kernelization KCDFCM, it depends on the possible shape of clusters in the data space. KCDFCM algorithm will be preferred when clustering the ‘non-spherical’ data, which is shown in the following experiments.

C. ‘Non-spherical’ Synthetic Datasets Four synthetic datasets with ‘non-spherical’ shaped clusters

are considered for further experiments [36]. Their plots are provided in Fig. 4 including Fuzzy ‘X’ (N=640, J=8, K=2, M=2), Parabolic (N=960, J=8, K=2, M=2), Ring (N=750, J=5, K=2, M=2), and Zig-zag (N=400, J=4, K=2, M=2). Because the number of peers is relatively small, the simple linear network structure illustrated in Fig. 5 is applied in these experiments. Each peer exchanges the messages only to the nearest neighbor(s).

(a) (b)

(c) (d)

Fig. 4. Six synthetic datasets. (a) Fuzzy “X”. (b) Parabolic. (c) Ring. (d) Zig-zag.

Fig. 5. Simple linear network architecture.

Table 4 illustrates the clustering results of different

clustering algorithms on these four synthetic datasets. For most of the synthetic datasets, such as Fuzzy “X”, Ring, and Zig-zag, the kernel-based clustering algorithms (KFCM-F and KCDFCM) can obtain excellent clustering solutions in ACR and ANMI. For example, the KCDFCM clustering algorithm achieves 100% in ACR (compared to approximate 50% in ACR of the traditional approaches) on Ring dataset. All these results prove the efficiency of kernel method for ‘non-spherical’ shaped data clustering. Similar to the results obtained in [36], the kernel-based clustering algorithms do not provide significant improvement over the traditional clustering methods on Parabolic dataset. This is possibly because of (1) selection of kernel function and/or (2) optimization of kernel parameters as discussed in [36][42]. The optimization of kernel parameters is a difficult problem and some endeavors have been made in [43]. However, the possible solution for distributed fuzzy clustering using existing methods is still a new problem and will be a good direction of future work. After all, KCDFCM is

J Peers

… Central Unit

Page 11: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

11

still the best algorithm in our comparisons and demonstrates the capability of clustering the ‘non-spherical’ dataset collaboratively.

TABLE IV STATISTICS OF DIFFERENT CLUSTERING ALGORITHMS ON FOUR SYNTHETIC

DATASETS IN TERMS OF AIN, ACR(%), AND ANMI Data set FCM WE

FCM KFCM

-F PFCM Soft -DKM

CD FCM

KCD FCM

Fuzzy ‘X’

*43.4 50.16 0.0008

50.6 50.90 0.0034

54.6 65.67 0.1487

85.8 50.16 0.0008

63.4 50.12 0.0007

79.6 50.90 0.0034

84.2 *67.48 *0.1921

Parabolic

*12.2 88.13 0.4759

36.6 88.44 0.4845

30.2 88.54 0.4878

28.6 88.13 0.4759

29.4 88.10 0.4744

39.2 88.18 0.4782

42.8 *88.75 *0.4944

Ring 40.0 50.11 0.0013

44.6 52.08 0.0022

*38.6 99.59 0.9655

48.4 50.11 0.0013

46.8 50.11 0.0013

49.2 51.33 0.0019

66.4 *100.00 *1.00

Zig- zag

*15.0 53.68 0.0026

27.2 54.84 0.0050

28.4 80.05 0.3768

33.6 53.68 0.0026

25.0 51.42 0.0014

31.2 52.66 0.0031

44.8 *84.26 *0.4406

* The best performance among the group.

D. UCI Machine Learning Datasets Five real-world datasets are selected from UCI repository

[44] are experimented with including: Iris (N=150, J=3, K=3, M=4), Glass (N=214, J=3, K=6, M=9), Ionosphere (N=351, J=4, K=2, M=33), Haberman (N=306, J=3, K=2, M=3), and Heart (N=267, J=3, K=2, M=44). The linear network structure in Fig. 5 is also applied. From the clustering results in Table 5, it can be seen that our algorithms have shown good performance in ACR and ANMI with the cost of a little more iterations on these real-world datasets. The best example is the Iris dataset. The consideration of attribute weight assignment brings the good clustering results in ACR (approximate 96%) and ANMI (approximate 0.88). This performance is almost the same as the best one, which is obtained by the centralized attribute-weighted algorithm (WEFCM). It again proves the efficiency of the attribute-weight-entropy regularization technique. Like the WSN-based experiments, in real-world data clustering, the kernel-based method also has not shown significant improvement.

TABLE V

STATISTICS OF DIFFERENT CLUSTERING ALGORITHMS ON THE UCI MACHINE LEARNING DATASETS IN TERMS OF AIN, ACR(%), AND ANMI

Data set FCM WE

FCM KFCM

-F PFCM Soft -DKM

CD FCM

KCD FCM

Iris *21.2 89.33 0.7433

26.2 *96.66 *0.8801

31.4 92.06 0.7773

38.6 89.33 0.7433

23.8 87.38 0.7294

30.2 95.90 0.8705

39.6 96.18 0.8792

Glass 56.2 42.08 0.2974

63.8 54.39 0.4263

*50.6 50.84 0.3331

107.6 42.08 0.2974

73.4 40.50 0.2848

86.2 52.96 0.4170

95.2 *55.71 *0.4590

Ionosphere

*13.9 70.94 0.1299

44.8 76.58 0.2026

16.4 73.36 0.1828

52.6 70.94 0.1299

36.8 67.77 0.1028

51.2 75.26 0.1961

54.4 *78.59 *0.2257

Haberman

*17.1 51.96 0.0024

18.2 *77.12 *0.0992

18.8 69.04 0.0304

29.4 51.96 0.0024

21.6 51.42 0.0018

26.8 74.68 0.0610

30.2 75.40

0. 0764

Heart 41.0 51.31 0.0052

48.4 72.88 0.0445

*30.2 70.38 0.0348

67.4 51.31 0.0052

46.2 50.24 0.0028

56.8 71.95 0.0408

63.2 *76.50 *0.1286

* The best performance among the group.

E. Text Datasets For high-dimensional data clustering, we use a popular text

dataset, 20 Newsgroups, obtained from [44]. This dataset is created by selecting 20000 files from 20 classes labeled by the topic name. All data are first preprocessed using the Bow toolkit [45] to eliminate the stop words, stem words, and the words occurred less frequently. Then the remaining words in each document are valued by the standard tf idf⋅ [39] as data attributes. Four test subsets are built with different topic/cluster numbers and word/attribute numbers as shown in Table 6.

TABLE VI SUMMARY OF FOUR TEST SUBSETS FROM 20 NEWSGROUPS

Subsets Topics (K) Files number (N)

Words number

S1 alt.atheism, misc.forsale 2000 260

S2 rec.sport.baseball, rec.sport.hockey 2000 250

S3 alt.atheism, comp.graphics, rec.sport.baseball, talk.politics.guns

4000 1205

S4 talk.politics.guns, talk.politics.mideast, talk.politics.misc, talk.religion.misc

4000 1273

Table 7 shows the clustering results of different clustering algorithms on these four subsets. It can be seen that CDFCM and KCDFCM algorithms show ideal performances in high-dimensional data clustering. The clustering results of subset S1 with the data coming from two very different topics is particularly evident. At the same time, the important features (topic-related words) can effectively extracted. Fig. 6 illustrates the word weight assignment obtained by CDFCM algorithm. Evidently, most words with higher weights are directly related to the corresponding topics (The top 20 words with higher weights are listed in Table 8). This experiment demonstrates that the proposed algorithm and its kernelization can well handle the clustering problem of real world high dimensional data with large sample size.

TABLE VII STATISTICS OF DIFFERENT CLUSTERING ALGORITHMS ON 20 NEWSGROUPS

SUBSETS IN TERMS OF AIN, ACR(%), AND ANMI Data set FCM WE

FCM KFCM

–F PFCM Soft –DKM

CD FCM

KCD FCM

S1 *21.4 83.62 0.5441

41.4 96.50 0.8545

28.6 87.94 0.5890

36.6 83.62 0.5441

57.6 81.06 0.5276

63.2 95.88 0.8139

94.6 *97.64 *0.8937

S2 *19.4 70.08 0.1263

52.6 *83.13 *0.3447

24.2 71.44 0.1315

32.8 70.08 0.1263

48.8 69.17 0.1148

61.4 80.26 0.3093

82.8 82.72 0.3268

S3 *23.8 76.04 0.5121

39.4 87.16 0.6785

31.2 81.65 0.5698

44.6 76.04 0.5121

49.6 73.82 0.4633

65.4 85.73 0.6471

81.4 *89.55 *0.7220

S4 *21.4 53.71 0.2060

36.6 57.59 0.2590

38.0 59.11 0.2738

39.2 53.71 0.2060

52.2 52.48 0.1834

57.8 57.06 0.2511

75.4 *61.36 *0.2968

* The best performance among the group.

Page 12: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

12

(a)

(b)

Fig. 6. Word weight calculated by CDFCM algorithm on subset S1. (a) alt.atheism. (b) misc.forsale.

TABLE VIII TOP 20 WORDS EXTRACTED BY CDFCM CLUSTERING ALGORITHM ON SUBSET

S1

Topic Extracted words

alt.atheism statement, religion, christian, abortion, truth, political, bible, horus, livesey, fact, god, morality, evidence, caltech, rusnews, objective, jesus, belief, islamic, argument

misc.forsale shipping, pc, card, video, computers, email, price, rochester, excellent, drive, sale, list, phone, udel, gatech, condition, offer, hp, columbia, software

V. CONCLUSIONS This study focuses on the data clustering in distributed P2P

networks and proposes a novel collaborative clustering algorithm. The centralized clustering problem is solved in a distributive mode at each peer by collaborating with the neighboring peers only. Not only the clustering performance of the proposed approach coincides with that of the centralized clustering approach, but the proposed method also reduces and balances the communication overhead among peers. Based on the attribute-weight-entropy regularization technique, important features are extracted under the optimal distribution of attribute weight. The kernel method is applied to the proposed collaborative clustering algorithm to meet the needs of ‘non-spherical’ shaped data clustering. Experiments on several synthetic and real-world datasets show that the proposed algorithms yield comparative or better performance compared with other clustering methods.

The results of this paper provide some valuable directions for

future work. The proposed architecture of collaboration on distributed peer-to-peer network is of great generality and could be considered for further applications in system modeling [46-48] and control problems [49-50] in distributed environments. In addition, a future improvement to the proposed algorithm could be the development of a partially supervised fuzzy clustering algorithm that uses advanced meta-heuristics methods and hybrid optimization techniques with fuzzy logic to optimize the objective function of clustering. Finally, the proposed algorithm predetermines the same number of clusters in each peer of the network. It could be improved by developing a collaborative approach that is able to estimate the number of clusters from the data initially.

APPENDIX Algorithm 2. Generation of WSN-based Synthetic Datasets

Input: The number of sensor nodes J, the number of objects belonging to the j-th sensor Nj, the number of clusters K, the number of data dimensions M, and the minimum distance δ between two cluster dimensions.

Output: A set of objects sets X={X1, X2, …, XJ} for all sensors.

//Generate the cluster prototypes Randomly generate the first cluster prototype

1 11 12 1[ , , ..., ]Mc c cc ; for k = 2 to K do

Generate the k-th cluster prototype ck such that , for ,1lm kmc c l k m Mδ− ≥ ∀ < ≤ ≤ ;

end for //Generate the objects set Xj for the j-th sensor for j=1 to J do

for n = 1 to Nj do Set cluster index of current object h = n mod K; for m = 1 to M do

Set /jnm hm hmx c r δ τ= + ∗ , where a random real number r is generated by a Gaussian distribution with zero mean and unit standard derivation, [ ] hmT τ= (1 ,1 )h K m M≤ ≤ ≤ ≤ is the scalar matrix for

controlling the data dispersion and hmτ is used for the m-th dimension of the h-th cluster;

end for end for

end for

REFERENCES

[1] R. Xu, D.C. Wunsch, “Survey of clustering algorithms,” IEEE

Transactions on Neural Networks, vol.16, no.3, pp.645-678, May 2005. [2] J.W. Han, M. Kamber, “Data mining: concept and techniques,” Morgan

Kanfmann, San Mateo, CA, 2001. [3] C.L. Philip Chen, Y. Lu, “FUZZ: A fuzzy-based concept formation

system that integrates human categorization and numerical clustering,” IEEE Transaction on Systems Man and Cybernetics, Part B-Cybernetics, vol.27, no.1, pp.79-94, Feb. 1997.

[4] J. MacQueen, “Some methods for classification and analysis of multivariate observation,” in Proc. of the 5th Berkeley Symposium on

Page 13: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

13

Mathematical Statistics and Probability, University of California Press, vol.1, pp.281-297, 1967.

[5] J.C. Bezdek, “Pattern recognition with fuzzy objective function algorithms,” Plenum, New York, 1981.

[6] K. Krishna, M. Narasimha Murty, “Genetic k-means algorithm,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol.29, no.3, pp. 433-439, Jun. 1999.

[7] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu, “An efficient k-means clustering algorithm: Analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, no.7, pp.881-892, Jul. 2002.

[8] J. Yu, Q. Cheng, H. Huang, “Analysis of the weighting exponent in the FCM,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol.34, no.1, pp.634-639, Feb. 2004.

[9] L. Zhu, F.L. Chung, S. Wang, “Generalized fuzzy c-means clustering algorithm with improved fuzzy partitions,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no.3, pp.578-591, Jun. 2009.

[10] L. Chen, C.L. Philip Chen, W. Pedrycz, “A Gradient-descent-based Approach for Transparent Linguistic Interface Generation in Fuzzy Models,” IEEE Transactions on Systems, Man, and Cybernetics, Part B-Cybernetics, Vol.40, No.5, pp.1219-1230, Oct. 2010.

[11] D.T. Anderson, A. Zare, S. Price, “Comparing Fuzzy, Probabilistic, and Possibilistic Partitions Using the Earth Mover's Distance,” IEEE Transactions on Fuzzy Systems, vol.21, no.4, pp.766-775, Aug. 2013.

[12] W. Pedrycz, “Collaborative fuzzy clustering,” Pattern Recognition Letter, vol.23, pp.675-686, Dec. 2002.

[13] C. L. Philip Chen, J. Zhou, W. Zhao, “A Real-time Vehicle Navigation Algorithm in Sensor Network Environments”, IEEE Transactions on Intelligent Transportation Systems, vol.13, no.4, pp.1657-1666, Dec. 2012.

[14] H. Kargupta, K. Sivakumar, “Data mining: Next generation challenges and future directions,” MIT/AAAI Press, Cambridge, MA, USA, 2004.

[15] K.M. Hammouda, “Distributed document clustering and cluster summarization in peer-to-peer environments,” PhD thesis, University of Waterloo, Department of Electrical and Computer Enginnering, 2007.

[16] H. Kargupta, P. Chan, “Advances in distributed and parallel knowledge discovery,” MIT/AAAI Press, Cambridge, MA, USA, 2000.

[17] W. Kim, “Parallel clustering algorithms: survey,” CSC 8530 Parallel Algorithms, 2009.

[18] S. Kantabutra, A. Couch, “Parallel K-means Clustering Algorithm on NOWs,” NECTEC Technical Journal, vol.1, no.6, pp.243-248, Jan. 2000.

[19] B. Zhang, M. Hsu, G. Forman, “Accurate recasting of parameter estimation algorithms using sufficient statistics for efficient parallel speed-up: Demonstrated for center-based data clustering algorithms,” In Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp.243-254, Sep. 2000.

[20] T. Kwok, K. A. Smith, S. Lozano, D. Taniar, “Parallel fuzzy c-means clustering for large datasets,” in Proc. of the 8th International Euro-Par Conference on Parallel Processing, pp.365-374, Aug. 2002.

[21] K. Kerdprasop, N. Kerdprasop, “A Lightweight Method to Parallel K-means Clustering,” International Journal of Mathematics and Computers in Simulation, vol.4, no.4, pp.144-153, 2010.

[22] W. Pedrycz, P. Rai, “Collaborative clustering with the use of fuzzy c-means and its quantification,” Fuzzy Sets and Systems, vol.159, no.18, pp.2399-2427, Sep. 2008.

[23] S. Datta, C. Giannella, H. Kargupta, “K-means clustering over a large, dynamic network,” In Proc. of the SIAM International Conference on Data Mining, pp.153-164, 2006.

[24] R. Kashef, “Cooperative clustering model and its applications,” PhD thesis, University of Waterloo, Department of Electrical and Computer Enginnering, 2008.

[25] P. A. Forero, A. Cano, G. B. Giannakis, “Distributed clustering using wireless sensor networks,” IEEE Journal of Selected Topics in Signal Processing, vol.5, no.4, pp.707-724, Aug. 2011.

[26] L.F.S. Coletta, L. Vendramin, E.R. Hruschka, R.J.G.B. Campello, W. Pedrycz, “Collaborative fuzzy clustering algorithms: Some refinements and design guidelines,” IEEE Transactions on Fuzzy Systems, vol.20, no.3, pp.444-462, Jun. 2012.

[27] H.P. Kriegel, P. Kröger, A. Zimek, “Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering,” ACM Transaction on Knowledge Discovery from Data, vol. 3, no. 1, pp.1-58, Mar. 2009.

[28] E.Y. Chan, W.K. Ching, M.K. Ng, J.Z. Huang, “An optimization algorithm for clustering using weighted dissimilarity measures,” Pattern Recognition, vol.37, no.5, pp.943-952, May 2004.

[29] J.Z. Huang, M.K. Ng, H. Rong, Z. Li, “Automated variable weighting in k-means type clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, no.5, pp.657-668, May 2005.

[30] G.J. Gan, J.H. Wu, Z.J. Yang, “A fuzzy subspace algorithm for clustering high dimensional data,” Lecture Notes in Computer Science, vol.4093, pp.271-278, Aug. 2006.

[31] L.P. Jing, M.K. Ng, J.Z. Huang, “An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data,” IEEE Transactions on Knowledge and Data Engineering, vol.19, no.8, pp.1026-1041, Aug. 2007.

[32] J. Zhou, C. L. Philip Chen, “Attribute weighted entropy regularization in fuzzy c-means algorithm for feature selection,” in Proc. of IEEE International Conference on System Science and Engineering, pp.59-64, Jun. 2011.

[33] Z. Deng, K.S. Choi, F.L. Chung, S. Wang, “Enhanced soft subspace clustering integrating within-cluster and between-cluster information,” Pattern Recognition, vol.43, no.3, pp.767-781, Mar.2010.

[34] K.R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf, “An introduction to kernel-based learning algorithms,” IEEE Transactions on Neural Networks, vol.12, no.2, pp.181-201, Mar. 2001.

[35] F. Camastra, A. Verri, “A novel kernel method for clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, no.5, pp.801-805, May 2005.

[36] D. Graves, W. Pedrycz, “Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study,” Fuzzy Sets and Systems, vol.161, no.4, pp.522-543, Feb. 2010.

[37] L. Chen, C.L. Philip Chen, M. Lu, “A multiple-kernel fuzzy c-means algorithm for image segmentation,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol.41, no.5, pp.1263-1274, Oct. 2011.

[38] R. Herbrich, “Learning kernel classifiers,” MIT Press, Cambridge, MA, 2002.

[39] Hsin-Chien Huang, Yung-Yu Chuang, Chu-Song Chen, “Multiple kernel fuzzy clustering,” IEEE Transactions on Fuzzy Systems, vol.20, no.1, pp.120-134, Feb. 2012.

[40] W. R. Heinzelman, A. P. Chandrakasan, H. Balakrishnan, “An application-specific protocol architecture for wireless microsensor networks,” IEEE Transactions on Wireless Communications, vol.1, no.4, pp.660-670, Oct. 2002.

[41] J. Zhou, C. L. Philip Chen, L. Chen, W. Zhao, “A User-customizable Urban Traffic Information Collection Method based on Wireless Sensor Networks”, IEEE Transactions on Intelligent Transportation Systems, vol.14, no.3, pp.1119-1128, Sep. 2013.

[42] N.R. Pal, K. Sarkar, “What and when can we gain from the kernel versions of c-means algorithm?” IEEE Transactions on Fuzzy Systems, in print, 2013.

[43] J. Huang, X. Chen, P. C. Yuen, J. Zhang, W. S. Chen, J. H. Lai, “Kernel parameter optimization of kernel-based LDA methods,” in Proc. of IEEE International Joint Conference on Neural Networks, pp.3840-3846, Jun. 2008.

[44] K. Bache, M. Lichman (2013), “UCI Machine Learning Repository,” Irvine, CA: University of California, School of Information and Computer Science, [Online]. Available: http://archive.ics.uci.edu/ml/

[45] A. McCallum (1998), “Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering,” [Online]. Available: http://www.cs.cmu.edu/~mccallum/bow/

[46] C.L. Philip Chen, S.R. LeClair, Y.H. Pao, “An incremental adaptive implementation of functional-link processing for function approximation, time-series prediction, and system identification,” Neurocomputing, vol.18, no.1-3, pp.11-31, Jan. 1998.

[47] Z. Liu, C.L. Philip Chen, Y. Zhang, H.X. Li, “A Three-domain Fuzzy Wavelet System for Simultaneous Processing of Time-Frequency Information and Fuzziness,” IEEE Transactions on Fuzzy Systems, vol.21, no.1, pp.176-183, Feb. 2013.

[48] Y.-Y. Lin, J.-Y. Chang, N.R. Pal, C.-T. Lin, “A Mutually Recurrent Interval Type-2 Neural Fuzzy System (MRIT2NFS) With Self-Evolving Structure and Parameters,” IEEE Transactions on Fuzzy Systems, vol.21, no.3, pp.492-509, Jun. 2013.

[49] C.L. Philip Chen, Y.H. Pao, “An integration of neural network and rule-based systems for design and planning of mechanical assemblies,” IEEE Transactions on Systems Man and Cybernetics, vol.23, no.5, pp.1359-1371, Sep.-Oct. 1993.

Page 14: A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TFUZZ.2013.2294205, IEEE Transactions on Fuzzy Systems

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

14

[50] S.C. Tong, H.X. Li, “Fuzzy adaptive sliding-mode control for MIMO nonlinear systems,” IEEE Transactions on Fuzzy Systems, vol.11, no.3, pp.354-360, Jun. 2003.

Jin Zhou (S’11) received the B.S. degree in computer science and technology from Shandong University, Jinan, China, in 1998, and the M.S. degree in software engineering from Shandong University, Jinan, China, in 2001. He is currently working toward the Ph.D. degree in the Department of Computer and Information Science, University of Macau at Macau,

China. His current research interests include Intelligent Transportation System, Computational Intelligence, and other machine learning techniques and their applications.

C. L. Philip Chen (S’88–M’88–SM’94 –F’07) received the M.S. degree from the University of Michigan, Ann Arbor, in 1985 and the Ph.D. degree from Purdue University, West Lafayette, IN, in 1988 all in electrical engineering. After having worked at US for twenty-three years as a tenured professor, department head, and associate dean in two different

departments and universities, he is currently a Chair Professor of the Department of Computer and Information Science and the Dean of the Faculty of Science and Technology at the University of Macau, Macau, China.

Dr. Chen is a Fellow of IEEE, a Fellow of American Association for the Advancement of Science (AAAS), and HKIE. He is currently the President of the IEEE Systems, Man, and Cybernetics Society. In addition, he has been served in different committees as in IEEE Fellows Committee and Conference Integrity Committee. He is an Accreditation Board of Engineering and Technology (ABET) Education Program Evaluator for Computer Engineering, Electrical Engineering, and Software Engineering programs. His research interests and areas are computational intelligence, systems, and cybernetics.

Long Chen (M’11) received the B.S. degree in information sciences from Peking University, Beijing, China in 2000, the M.S.E. degree from Institute of Automation, Chinese Academy of Sciences in 2003, the M.S. degree in computer engineering, from University of Alberta, Canada in 2005, and the Ph.D. degree in electrical engineering from the University of Texas at San

Antonio, USA in 2010. From 2010 to 2011 he was a postdoctoral fellow in the University of Texas at San Antonio. Dr. Chen currently is an assistant professor with the department of Computer and Information Science, University of Macau, China. His current research interests include computational intelligence, Bayesian methods, and other machine learning techniques and their applications. Mr. Chen has been working in publication matters for many IEEE conferences and is the Publications Co-chair of the IEEE International Conference on Systems, Man and Cybernetics (SMC) 2009.

Han-Xiong Li (S’94-M’97-SM’00-F’11) received his B.E. degree in aerospace engineering from the National University of Defense Technology, China, M.E. degree in electrical engineering from Delft University of Technology, The Netherlands, and Ph.D. degree in electrical engineering from the University of Auckland, New Zealand.

Currently, he is a professor in the Department of Systems Engineering and Engineering Management, the City University of Hong Kong. Over last twenty years, he has worked in different fields, including military service, industry, and academia. He published over 140 SCI journal papers with h-index 25. His current research interests are in system intelligence and control, integrated process design and control, distributed parameter systems with applications to electronics packaging.

Dr. Li serves as Associate Editor of IEEE Transactions on Systems, Man & Cybernetics, part-B, and IEEE Transactions on Industrial Electronics. He was awarded the Distinguished Young Scholar (overseas) by the China National Science Foundation in 2004, a Chang Jiang professor by the Ministry of Education, China in 2006, and a national professorship in China Thousand Talents Program in 2010. He serves as a distinguished expert for Hunan Government and China Federation of Returned Overseas. He is a fellow of the IEEE.