Method for increasing the computation speed of an ... · Keywords- Particle Swarm Optimization; Clustering; Statistical Analysis; Complexity Analysis. I. INTRODUCTION Clustering can

Method for increasing the computation speed of an unsupervised learning approach for data clustering

Mitchell Yuwono, Steven W. Su, Bruce Moulton, Hung Nguyen Centre of Health Technology

University of Technology, Sydney Sydney, Australia

[email protected]

Abstract—Clustering can be especially effective where the data is irregular, noisy and/or not differentiable. A major obstacle for many clustering techniques is that they are computationally expensive, hence limited to smaller data volume and dimension. We propose a lightweight swarm clustering solution called Rapid Centroid Estimation (RCE). Based on our experiments, RCE has significantly quickened optimization time of its predecessors, Particle Swarm Clustering (PSC) and Modified Particle Swarm Clustering (mPSC). Our experimental results show that on benchmark datasets, RCE produces generally better clusters compared to PSC, mPSC, K-means and Fuzzy C-means. Compared with K-means and Fuzzy C-means which produces clusters with 62% and 55% purities on average respectively, thyroid dataset has successfully clustered on average 71% purity in 14.3 seconds.

Keywords- Particle Swarm Optimization; Clustering; Statistical Analysis; Complexity Analysis.

I. INTRODUCTION Clustering can be viewed as an exploratory data analysis

tool. An objective of clustering is to identify parts of the data that has high degrees of similarity with other parts of the data, and group the similar parts together into clusters. Similarity can be measured by many means such as Euclidean distance, Manhalobis distance, cosine similarity, and Pearson correlation [1].

Clustering techniques are increasingly considered to be one of the key components when performing analyses on very large amounts of multidimensional data. In order to analyze such data sets, clustering methods are often used as an integral tool for data preprocessing [1]. This is especially the case when data is recorded from multiple sources in an uncontrolled environment.

Particle swarm optimization (PSO) is a stochastic optimization approach originally proposed by Kennedy & Eberhart in 1995. It was inspired by the behavior of flocks of birds and schools of fish [2]. However, a known problem with PSO is that it can suffer from stagnation when particles prematurely converge on particular regions of the potential solution-space. Regrouping Particle Swarm Optimization (RegPSO) was recently proposed by Evers & Ghalia as an approach to overcome stagnation problem [3].

Recent research at University of Technology Sydney has resulted in several implementations of PSO for solving search and optimization problems [4, 5, 6]. A variant of RegPSO was recently used to cluster fall data [4] and head movement data [5] with promising results. The particle structure, seeding and swarming strategies were similar to that of Van Der Merwe & Engelbrecht, where each particle represents a complete centroid combination [7]. The research suggested that optimal clusters could be produced using RegPSO, but that it was too slow for our requirements in processing higher dimensional data.

Data clustering using Particle swarm optimization was first proposed by Van Der Merwe & Engelbrecht in 2003 with promising results [7]. Particle Swarm Clustering (PSC), a PSO algorithm specially designed to optimize clustering problems was proposed by Cohen & de Castro in 2006 [8]. Inspired by social interaction of humans in a global neighborhood, PSC organizes data-points into clusters based on the interdependence of each particle. Cohen & de Castro reported that PSC is superior to K-means on benchmark datasets [8].

A modified PSC algorithm called Modified PSC (mPSC) was proposed by Szabo in 2010 [9]. With the assumption that the use of the term velocity is not appropriate in the context of social neighborhood, mPSC eliminates the need for velocity and inertia weight during the update procedure. The algorithm was reported to reduce computation time while preserving cluster quality. It was conceded, however, that mPSC suffers from a long optimization time similar to its predecessor.

In this paper we propose a lightweight modification for mPSC we call the Rapid Centroid Estimation (RCE). We developed this new variant after using and reviewing PSC and mPSC.

The paper is organized as follows. Section I provides an introductory background to clustering, PSO, PSC, mPSC and RCE. Section II presents a short overview on the classic PSC and mPSC algorithms. Section III critically evaluates some conceptual issues on PSC and proposes our improved algorithm, RCE. A comparative study between algorithms is given in Section IV, including measurements of computation time and performance on benchmark datasets. Conclusions and future research directions are given in Section V.

U.S. Government work not protected by U.S. copyright

WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia IEEE CEC

II. OVERVIEW ON THE PSC AND MPSC ALGORITHM

A. Particle Swarm Clustering (PSC) According to [8], Particle Swarm Clustering (PSC) can be

viewed as a special modification of PSO devised specifically for clustering tasks. This is in contrast to the general implementation of PSO where each particle represents a candidate solution. In PSC, each particle represents only a fraction of a solution: a cluster centroid prototype. It follows from this that the optimal set of centroids can only be represented by the whole swarm.

PSC optimization strategy can be represented in a star-like global interdependent neighborhood, illustrated in Figure 1. In this neighborhood the slightest coordinate change of a particle will trigger a global reaction involving every other particle in the swarm – naturally preserving the gravitational equilibrium in the neighborhood. PSC does not rely on any fitness function because particles will naturally accelerate towards the center of gravity of the natural data cluster. PSC also poses less computational burden compared to classic PSO clustering.

The updating rule of PSC can be summarized as follows.

For each input pattern j, position of a particle i will be updated using (1-3). Note that particle velocity is bounded by ±vmax.

( ) ( ) ( ) ( ) ( ) ( )tZtYtXtvttv iiiii ⊗ϕ+⊗ϕ+⊗ϕ+ω=+ 3211 (1)

[ ]maxmax , vvvi −∈ (2)

( ) ( ) ( )11 ++=+ tvtxtx iii (3)

Here Xi(t), Yi(t), and Zi(t) indicate cognitive (4), social (5) and self-organizing (6) terms:

( ) ( ) ( )txtptX iii −= (4)

( ) ( ) ( )txtgtY iji −= (5)

( ) ( )txytZ iji −= (6)

Where ω(t) represents inertia weight which decreases geometrically. φk are uniform random numbers 0≤ φk ≤1. xi(t),

vi(t), and pi(t) denote position, velocity and best position of a particle i in relation to the input pattern j. gj(t) represents the position of a particle that has been closest to the input pattern j. The pseudo code of the algorithm can be seen in Figure 2.

B. Modified Particle Swarm Clustering (mPSC) Szabo proposes a modification to the original PSC called

the Modified PSC (mPSC) in 2010 [9]. Szabo argues that in the context of man’s ability to process knowledge, the use of “velocity memory” in equation (3) is irrelevant. According to Kennedy, equation (3) represents a continuous change in mental status, opinion, belief, and behavior. Based on this concept, Szabo proposes the velocity term to be replaced by a Δx (8) which represents a small perturbation in the behavior of the individual. This perturbation should appropriately contributed by the experience of the individual (individual cognition), and the influence of the social environment (social interaction) in which the individual is inserted (7).

Algorithm S = PSC(dataset, max_iteration, vmax, nc, ω) Initialize nc particles, randomize x, and initialize v to zeros Calculate distances of p, g for each particle and each datum. while t < max_iteration for each datum y update distance matrix and find the closest particle ( ) jixydD ij ,:, ∀= ( )( ) { }cij nixydI ,...,2,1:,min ∈=

Updates personal best and global best of the particle

( ) ( ) ( )( ) ( )( )( )⎩

⎨⎧ <

=+otherwisetp

tpydtxydiftxtp

I

IjIjII

,,1

( ) ( ) ( )( ) ( )( )( )⎩

⎨⎧ <

=+otherwisetg

tgydtxydiftxtg

j

jjIjIj

,,1

Update velocity and position using (1-3) ( ) ( ) ( ) ( ) ( ) ( )tZtYtXtvttv IIIII ⊗ϕ+⊗ϕ+⊗ϕ+ω=+ 3211

[ ]maxmax ,vvvI −∈ ( ) ( ) ( )11 ++=+ tvtxtx III

end Find the winning particle (particle with the least Euclidean distance to an input pattern)

( ) ( ) ( ) ( )( )( ) itxtpdtxtx iiwinmost ∀−∈= :min_

for each particle x if (xi is not close to any datapoint)

move xi towards winning particle using (1-3). ( ) ( ) ( ) ( ) ( )( )txtxtvttv iwinmostII −⊗ϕ+ω=+ _41

( ) ( ) ( )11 ++=+ tvtxtx iii end

end ( ) ( )tt ω=+ω 95.01

1+= tt end

Figure 2. PSC Algorithm

Figure 1. Global Neighborhood Scheme used in PSC [8, 9]

( ) ( ) ( ) ( )tZtYtXtx iiii ⊗ϕ+⊗ϕ+⊗ϕ=Δ 321 (7)

( ) ( ) ( )txtxtx iii Δ+=+ 1 (8)

mPSC contributes to the PSC algorithm by eliminating the need inertia weight ω and velocity clamp, increasing speed and reducing computational cost without diminishing the quality of the cluster.

III. RAPID CENTROID ESTIMATION

A. Observations regarding PSC Szabo has made a contribution to the original PSC by

addressing computational time reduction [9]. He provides a comparative table for PSC and mPSC with benchmark datasets. However, although Szabo has successfully reduced the overall computation time by eliminating velocity update, it seems further improvements are possible. We suspect that the high computational load of the PSC algorithm may stem in part from characteristics of the original formulation of the algorithm itself. Referring to the algorithm at Figure 2, the following observations can be made:

1) In order to escape the local minima trap, for each iteration, a corresponding particle position is updated for each datapoint. This means a position update occurs j times per iteration, where j is the number of datapoints in the set.

2) Every change of particle position demands the distance matrix to be updated. This means distance matrix is also updated j times per iteration.

3) There is no global minimum computation to indicate the end of optimization.

In the original PSC algorithm proposal, a single iteration

demands particle positions to be updated j times. The total time complexity for n iterations is approximated as follows.

Given that there are i particles in a swarm and j elements of k-dimensional data in the set, time complexity for the algorithm OPSC for n iterations can be approximated by (9).

( ) ( ) ( )( )kOkjOkjiOjnnO PSC ++= ****)( (9)

where

O(k) approximates the complexity of updating each position. It consists mainly on basic operations such as elementary additions, subtractions, and multiplications which are applied for each dimension.

O(j*k) approximates the updating of each k-dimensional element in the set, including finding minimum distance, personal and global best update.

O(i*j*k) approximates the updating of the distance matrix. In order to update the distance matrix, calculation for each data point j involves i particles with k dimension.

Given that the critical contributor to complexity appears to be the updating of the distance matrix, this observation would be consistent with Szabo’s report of a small reduction in computation time on mPSC as compared to PSC [9]. As seen in

(9) time complexity of PSC position update (O(k)) is small compared to that of distance matrix update (O(i*j*k)).

B. Rapid Centroid Estimation We propose that a reduction in the frequency of distance

matrix update would significantly reduce the computation time. It seemed a sensible way to reduce the distance matrix update frequency might be to reduce the frequency of movement update.

In order to reduce the time complexity of the PSC and mPSC algorithm, we propose a modified version of mPSC we call the Rapid Centroid Estimation (RCE).

RCE is broadly analogous to a concept of individual decision-making according to expected utility. According to the expected utility hypothesis, decisions are made based on the evaluation of utility and risk of every combination of available choices [10]. In every decision made there are consequences which will affect not only the decision maker but also the neighborhood to which the decision maker interacts. The decision maker will also have a preference towards a specific choice which is subjectively less risky.

In the original form of mPSC, the term position x is also analogous to decision-making behavior of an individual [9]. Δx is the perturbation from his/her personal experience and the influence of the environment that causes the individual to make a decision to change his/her behavior. However, in the originally proposed mPSC algorithm, behavior is updated each time the individual encounters new information in the neighborhood [8, 9]. This results in unnecessary behavioral updates which undermine the optimization efficiency.

We propose that the mPSC update scheme is too naïve. It seems to us that a reasonable alternative approach would make use of an algorithm that does not change behavior every time new information is presented in the neighborhood. Further, one of the proposal in the expected theory hypothesis states that in order to make a constructive decision, an individual must first assess the utilities and risks posed by all the problems and also the opportunities presented [10]. Inspired by the logical paradigm presented in this proposal, we incorporate this mindset in our update scheme. We propose an amendment to the algorithm as such:

1) For each iteration, each particle position is updated only once. This happens after all possible data points which are closer to that particle have been subjectively considered by the particle. This means position update occurs only i times per iteration, where i is the number of particles in the set.

2) The distance matrix and best positions are updated after all particle positions are updated. This means the distance matrix is updated only once per iteration.

3) A global minimum computation is defined to store the best position combination and to stop the optimization when a long stagnation is detected. Stagnation is detected when fitness gradient is higher than stagnation threshold ε for more than s_max iterations.

Implementing above modifications to PSC, the methods for calculating the terms known as Cognitive (10), Social (11), Self-organizing (12), and Best position (13) are redefined as follows:

( ) ( ) ( )txtptX iii −= (10)

( )( ) ( )( )

( )

j

txjijji

i N

txtg

tY i

∑∈∀

−⊗ϕ

=,ˆ

(11)

( )( ) ( )( )

( )

j

txjijji

i N

txty

tZ i

∑∈∀

−⊗ϕ

=,

(12)

( ) [ ]bestnc

bestbest xxxtM 21= (13)

Where ji ,ϕ is the subjectivity level towards an input pattern, modeled using uniform random numbers 0≤ ji ,ϕ ≤1. xi(t) and pi(t) denote position and best position of a particle i in relation to the input pattern j.

gj(t) represents the position of a particle that has been closest to the input pattern j; and

M(t) represents the best position combination that has achieved global minimum according to a given fitness function f (e.g. Euclidean distance).

Δx is redefined as follows.

( ) ( ) ( ) ( ) ( ) ( )tZtYtXtxtwtx iiiii ++⊗ϕ+Δ=+Δ 11 (14)

( ) ( ) ( )11 +Δ+=+ txtxtx iii (15)

Where w(t) is the inertia weight. In this algorithm we propose the inertia weight to be initialized as 0.8 and geometrically decreased each iteration.

The RCE algorithm is given in Figure 3.

The effect of the algorithm amendment can be visualized when the behavioral change is projected to a 2-Dimensional space. A three-class artificial dataset was used to draw the movement trajectories of particles in Figure 4 and Figure 5.

The employment of consequence consideration scheme allows RCE to have higher optimization efficiency than its predecessor. The new movement update and stagnation stop scheme makes RCE converges faster to global minimum than PSC and mPSC. The graph of fitness function versus iteration can be seen in Figure 6.

Compared with the trajectories of mPSC at figure 4, RCE movements are more precise and effective. Figure 5 shows that in this particular artificial dataset, RCE reaches convergence after a few movement updates, whilst mPSC reaches optimal location after numerous movement updates – where most of the movements are unnecessary moves.

The algorithm allows RCE to have higher optimization efficiency than its predecessor.

Algorithm S = RCE(dataset, max_iter, s_max, ε , nc) Initialize nc particles, randomize x, Calculate distances of p, g for each particle and each datum. while t < max_iter && sc <s_max update distance matrix ( ) ( ) jixydtD ij ,:, ∀=

Find the closest data point for each particle ( )[ ] ),min(minmin iDIxtDx = Find the closest particle for each data point ( )[ ] ),min(minmin jDIytDy = Update pi(t), gj(t), and M(t)

( ) ( ) ( ) ( )( )⎪⎩

⎪⎨⎧ −<=∀+

otherwisetptDxtDxiftyitp

i

iiIxi

1:1minmin

min

( ) ( ) ( )( )⎪⎩

⎪⎨⎧ −<

=∀+otherwisetg

tDytDyifyjtg

j

jjIyj

1:1

minminmin

( ) ( ) ( )( ) ( )( )( )⎩

⎨⎧ <∀∀

=+otherwisetM

tMfitxfifitxtM ii ::

1

Increment sc if gradient is higher than ε−

( )( ) ( )( )⎩⎨⎧ ε−>−++

=otherwise

tMftMfifss c

c 011

Find the winning particle (the particle with the least Euclidean distance to an input pattern)

( ) ( ) ( ) ( )( )( ) itxtpdtxtx iiwinmost ∀−∈= :min_

for each particle x Get the elements which are the members of the

particle i (cluster centroid). ( )txyy i

clusteri ∈∀=

( )clusterii ysizeN =

Calculate position update accordingly using (15) if

iN is greater than zero, otherwise redirect trajectory to most_win particle coordinate:

( ) ( ) ( )( ) ( ) ( )( )⎩

⎨⎧

−⊗ϕ+>+Δ+

=+otherwisetxtxtxNiftxtx

txiwinmosti

iiii

_5

011

end ( ) ( )twtw 95.01 =+

1+= tt end

Figure 3: RCE Algorithm

A plot showing Fitness function versus Iteration on the Iris dataset is given in Figure 6.

C. RCE Time Complexity Given that there are i particles in a swarm and j elements of

k-dimensional data in the set, time complexity for the algorithm ORCE for n iterations can be approximated using the same method used in III.A (15). Since the coordinate update involves i additional loops to assess the datapoints in each cluster, the new time complexity for coordinate update can be approximated by O(i*j*k). Since one distance matrix update is required after position updates, the time complexity for an iteration is approximated by (i+1)* O(i*j*k).

( )( )kjiOinnOMMPSC ***)1(*)( += (15)

Comparing (15) with (9), RCE is seen to have a lower level of complexity than PSC.

An additional stopping criterion based on the fitness gradient of global minimum, n, is most likely less than the maximum number of iterations, as can be seen in Figure 6.

IV. PERFORMANCE EVALUATION

A. RCE Time Complexity In order to assess the optimization speed of each algorithm,

synthetically generated datasets are used. For all experiments, PSC, mPSC, and RCE configurations are set as follows: maximum iteration of PSO and mPSO are set to 25; initial inertia weight is set to 0.90 with a decay rate of 0.95; RCE is set to maximum iteration of 100, and set to optimize continuously until global minimum is reached (when

( )( ) ( )( ) ε−>−+ tMftMf 1 ), where it takes 10 additional iterations before stopping. ε is set to 1.0e-4. The fitness function that is minimized is the sum of Euclidean distance.

The first test is a simple clustering task of a 2 class 2-dimensional data with varying volume. The clustering time with respect to varying volume can be seen in Table I. Table II shows the fitted polynomial regression formula of time versus volume. A plot of the experimental result is given in Figure 7.

The second test is another simple clustering task of a 2 class, 100 point data set with varying dimensionalities. The clustering time with respect to varying dimensionality can be seen in Table III. Table IV shows the fitted polynomial regression formula versus dimensionality for each algorithm. A plot of the experimental result is given in Figure 8.

The third test is a more complex clustering task of a multiclass, 500 points 3-Dimensional data. The number of clusters is varied from 2 to 15. FCM and K-means are not included in the third test because of their tendency to stop at local minima, hence provide unstable stopping time and questionable validity of the resulting clusters. K-means and FCM had difficulty dealing with enormous amount of classes where most of the times these algorithms stagnated at local minima. The clustering time with respect to the varying number of classes is seen in Table V. Table VI shows the fitted polynomial regression formula versus dimensionality for each algorithm. A plot of the experimental result is given in Figure 9.

Figure 4. mPSC particle trajectory at each position update. Dots are datum, large circles denotes starting points, X denotes final points

Figure 6. RCE, PSC and mPSC fitness graph versus iteration number on Iris

dataset optimization. RCE detected stagnation at iteration 49. It stops automatically at iteration 59.

Figure 5. RCE particle trajectory at each position update, dots are datum,

large circles denotes starting points, X denotes final points

TABLE I. CLUSTERING TIME FOR EXPERIMENT 1

Volume (n)

Clustering Time (ms) RCE PSC mPSC K-Means FCM

100 92.85 945.29 930.91 1.88 1.49 1000 1015.47 12868.9 12736.7 2.59 5.517 2000 5375.57 33342.7 33171.2 3.27 11.77 4000 7594.49 110414 105994 6.40 17.27

10000 9933.29 524260 531608 8.90 31.47

TABLE II. FITTED POLYNOMIAL REGRESSION FOR EXPERIMENT 1

Algorithm Fitted Polynomial Regression RCE t = 1.3173n – 107.5 PSC t = 0.0043n2 + 9.92n – 514.15

mPSC t = 0.0045n2 + 7.8962n – 236.25 K-Means t = 0.0008n + 0.6282

FCM t = 0.0036n – 0.1942 n denotes data volume, t denotes clustering time in milliseconds

TABLE III. CLUSTERING TIME FOR EXPERIMENT 2

Dimen-sion(d)

Clustering Time (ms) RCE PSC mPSC K-Means FCM

5 134.20 1571.27 1547.37 2.27 1.693 20 195.70 4630.28 4679.35 3.34 2.780 40 237.95 8736.73 8727.68 4.53 4.396 60 299.59 12864.8 12936.4 7.28 8.04 90 477.48 19130.5 19122.3 7.99 8.54

TABLE IV. FITTED LINEAR REGRESSION FOR EXPERIMENT 2

Algorithm Fitted Linear Regression RCE t = 4.67d + 90.95 PSC t = 206.59d + 522.04

mPSC t = 206.77d + 514.02 K-Means t = 0.0693d + 7.6695

FCM t = 0.0729d + 2.205 d denotes data dimension, t denotes clustering time in milliseconds

TABLE V. CLUSTERING TIME FOR EXPERIMENT 3

Classes (Nc)

Clustering Time (ms) RCE PSC mPSC

2 802.15 7020.37 6912.97 4 806.40 10595.23 10446.03 8 853.26 17544.70 17566.51

12 1027.51 24157.69 24186.53 15 1347.1 29060.40 29288.46

TABLE VI. FITTED LINEAR REGRESSION FOR EXPERIMENT 3

Algorithm Fitted Linear Regression RCE t = 14.99Nc + 887.3 PSC t = 1712.7 Nc + 3768.8

mPSC t = 1743.6Nc + 3593.3 Nc denotes number of classes, t denotes clustering time in milliseconds

Figure 8. Time Comparison between Clustering Algorithms with Respect

to Varying Dimension


to Varying Volume


to Varying Number of Classes

The results of these experiments show that RCE has significant advantage over PSC and mPSC in optimization speed.

In Table I it is evident that PSC and mPSC have an exponential relation to volume of dataset, which is as expected from prior approximation at (9). Table II and Figure 7 show this phenomenon where the regression line of the time versus volume function is approximated as a second order polynomial.

Table III and Table IV describe the linear relationship between optimization times with varying data dimensionality. This is seen in Figure 8. This experiment also demonstrates that RCE has a significant advantage over PSC and mPSC in terms of speed when dealing with data of higher dimension.

Table V and Table VI show the linear relationship between Optimization time and Number of classes. From the regression and the graph in Figure 9 it is evident that RCE has overall much faster optimization time, as compared with its predecessors.

Another interesting observation stems from a comparison of the fitted polynomial regressions of PSC and mPSC. According to [9], mPSC should have shorter optimization time than that of PSC. However, the fitted regressions suggest that the optimization time for mPSC does not differ much from PSC, as seen in Tables II, IV, and VI.

B. RCE Performance and Cluster Quality Assessment Szabo measures the performance of different clustering

algorithms by comparing the Entropy (16, 19), Purity (17, 20) and Percentage of Misclassification (18, 21) which is listed in Table VII [9].

TABLE VII. TABLE OF FORMULAS

Meter Formula

Entropy (Er) ∑=

−=k

i r

ir

r

ir

r nn

nnSE

1

ln)( (16)

Purity (Pr) ( )r

ir

r nnSP max)( = (17)

Percentage Misclassification (Pmr) r

fpr

r nn

Sm =)(% (18)

Overall Entropy (E) ( )r

k

r

rg SE

nn

E ∑=

=1

(19)

Overall Purity (P) ( )r

k

r

rg SP

nn

P ∑=

=1

(20)

Overall Percentage Misclassification (Pm) n

nPfp

m = (21)

Entropy measures cluster homogeneity (16). Lower entropy shows that objects in the database are homogenous. Sr indicates cluster r, k is the total number of classes in the cluster, i

rn is the number of object of class i inside the cluster r. The total entropy is the weighted sum of the entropy of each cluster (19). nr is the number of objects in cluster r, n is the dataset volume.

Purity index measures the purity of the cluster by taking the ratio of the dominant class of the group in relation to the total number of objects inside the group (17). If the dominant class for a cluster is similar to the dominant class of another cluster, the next significant class is selected. The overall Purity is the weighted sum of purity of each group (20). Higher purity is desirable for a good cluster.

Percent misclassified is the ratio of false positive classifications (fp) to the number of objects (21). Low percentage of misclassification is a criterion for a good cluster.

We tested the performance of each algorithm using the datasets acquired from UCI machine learning repository. The datasets used are Iris, Thyroid, Glass, Wine, Breast Cancer, and Diabetes Pima Indian. The characteristic of the datasets are listed in Table VIII. The clustering process is repeated 50 times. PSC and mPSC are set to optimize to 100 iterations, while RCE are set to optimize to a maximum of 500 iterations or until global minimum is reached, where it takes 100 additional iterations before stopping. ε is set to -1.0e-4.

TABLE VIII. CHARACTERISTICS OF DATASETS

Dataset Characteristics Volume Dimension Classes

Iris 150 4 3 Thyroid 7200 21 3 Glass 214 9 2 Wine 178 13 3

Breast Cancer 699 9 2 Diabetes

Pima Indian 768 8 2

TABLE IX. EXPERIMENTAL RESULTS

Dataseta Algorithm RCE PSC mPSC Kmeans FCM

Iris

E .24±.03 .26±.03 .27±.03 .26±.02 .27±0.0 P .87±.10 .89±.016 .89±.015 .83±.14 .89±0.0 Pm .13±.10 .11±.016 .11±.015 .17±.14 .11±0.0 t .68±.27 22.3±0.9 22.1±1.05 2e-3±6e-4 9e-3±4e-3

Thyroid

E .60 ±.25 N/A N/A .77±.20 .92±0.0 P .71±.12 N/A N/A .62±.11 .55±0.0 Pm .29±.12 N/A N/A .38±.11 .45±0.0 t 14.3±3.8 ~ ~ .03±.013 .49±.07

Glass

E .28±.09 .30±.05 .30±.036 .32±.07 .30±0.0 P .85±.06 .89±.03 .89±.017 . 88±.04 .91±0.0 Pm .15±.06 .123±.03 .114±.017 .12±.04 .094±0.0 t .79±.28 46.02±.02 45.87±.30 3e-3±2e-3 7e-3±5e-3

Wine

E .51±.10 .62±.007 .62±.008 .61±.03 .63±0.0 P .62±.07 .71±.01 .71±.001 .67±.07 .69±0.0 Pm .38±.07 .288±.01 .289±.01 .33±.07 .31±0.0 t 1.65±.63 66.6±.43 66.4±.4 3e-3±7e-4 .02±4e-3

Breast Cancer

E .16±.001 .17±.013 .17±.014 .17±0.0 .18±0.0 P .96 ±.003 .96±.006 .95±.006 .96±0.0 .95±0.0 Pm .04±.003 .04±.006 .05±.006 .043±0.0 .047±0.0 T 0.85±.24 195±1.8 194±1.1 3e-3±5e-4 .01±.02

DiabetesPima Indian

E .23±.13 .51±.03 .50±.03 .50±0.0 .55±0.0 P .66±.009 .66±.004 .66±.003 .66±0.0 .66±0.0 Pm .34±.002 .34±.004 .34±.003 .34±0.0 .34±0.0 t 0.95±.29 202±7.4 202±11.19 6e-3±1e-3 .04±9e-3

E denotes entropy, P denotes purity, Pm denotes percentage misclassification, t denotes time (seconds)

From the experimental results presented in Table IX, it is

evident that RCE has results superior to PSC and mPSC. RCE, PSC, and mPSC will theoretically have generally better clusters than K-means and FCM because of their capabilities to escape local minima [8, 9]. However table IX shows that for some cases like diabetes and wine dataset, K-means produces better centroids than PSC and mPSC. The cluster quality of PSC and mPSC are possibly undermined due to the less stable movement as has been explained in III.B. In all cases, RCE produces superior cluster compared to the other algorithms. This is evident especially in diabetes dataset.

An intriguing observation was acquired when clustering Thyroid dataset. This dataset is particularly interesting because of its massive volume (7200 points) and high dimensionality (21 dimensions) and has been considered a difficult benchmark dataset for machine learning [11]. In this particular dataset, RCE produces significantly higher quality clusters compared to the other algorithms (E = 0.60 ± 0.25; P = 0.71 ± 0.12; Pm = 0.29 ± 0.12) with an average optimization time of 14.3 seconds for 50 tests. This experimental result indicates the superior capability of RCE for finding the global minimum on complex data clustering problem. K-means and FCM produces clusters with 62% purity and 55% purity on this dataset which is suspected to be a local minimum point. We are unable to report any result on PSC and mPSC since both are unable to finish any optimization run after 12 hours – optimization was then aborted. Another important observation is that the results produced by PSC, mPSC and RCE have relatively high standard deviations despite their lower mean values. There is therefore a suspicion that stopping the search when equilibrium state is reached may not be a beneficial property.

The graph representation of Table IX for Entropy on different datasets using different algorithms can be seen in Figure 11. Entropy is minimized to zero if and only every cluster is on perfect unity [12]. It can be seen in Figure 11 that the overall entropy of clusters produced by RCE is significantly lower than the other algorithms.

V. CONCLUSIONS AND FUTURE RESEARCH A modification to mPSC algorithm which we call RCE has

been proposed in this paper. This modification has significantly reduced the time complexity and the efficiency of each position update.

The performance of each algorithm has been investigated. Based on optimization time evaluation, RCE performs much faster than its predecessors. We have shown the relationship of optimization time of RCE, PSC and mPSC algorithms with respect to data dimension, volume, and classes. RCE’s dramatic increase in optimization speed does not undermine the quality of the clusters it produces. Instead, experimental results using benchmark datasets show that RCE has advantage over PSC and mPSC.

The high standard deviations in results produced in the experiments suggest that further algorithmic improvement is possible. In order to improve the capability of RCE, further investigation including the behavioral of RCE and advanced techniques in order to increase repeatability need to be considered. Experimentations using different fitness functions and minimization approaches will be carried.

REFERENCES [1] M. R. Peterson, T. E. Doom, and M. L. Raymer, “Ga-facilitated knn

classifier optimization with varying similarity measures.” In Proc IEEE Congress on Evolutionary Computation, 2005, pp. 2514–2521.

[2] R. Eberhart and J. Kennedy, “A New Optimizer Using Particle Swarm Theory”, in IEEE Sixth International Symposium on Micro Machine and Human Science, 1995, pp. 39-43.

[3] G.I. Evers and M.B. Ghalia, “Regrouping Particle Swarm Optimization: A New Global Optimization Algorithm with Improved Performance Consistency Across Benchmarks,” in Proc International Conference on Systems, Man, and Cybernetics 2009, San Antonio, TX, USA, 2009, pp.3901-3908

[4] M. Yuwono, A.M.A. Handojoseno, H.T. Nguyen, “Optimization of head movement recognition using Augmented Radial Basis Function Neural Network,” in Proc 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, 2011, pp.2776-2779.

[5] M. Yuwono, S.W. Su, B. Moulton, “Fall detection using a Gaussian Distribution of Clustered Knowledge, Augmented Radial Basis Neural-Network, and Multilayer Perceptron” in Proc 2011 International Conference in Broadband and Biomedical Communications, Melbourne, Vic, Australia, 2011, to be published.

[6] M. Yuwono, "Unwrapping Hartmann-Shack images of off-axis aberration using artificial centroid injection method," in Proc 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI), Shanghai, China, 2011, vol.1, pp.560-564

[7] D.W. Merwe and AP Engelhrecht, “Data Clustering using Particle Swarm Optimization,” in Congress on Evolutionary Computation, vol.1, 2003, pp. 215 – 220.

[8] S.C.M. Cohen & L.N. de Castro, “Data Clustering with Particle Swarms,” in Proc 2006 IEEE Congress on Evolutionary Computations, 2006, pp.1792-1798.

[9] A. Szabo, A.K.F. Prior, L.N. de Castro, “The Proposal of a Velocity Memoryless Clustering Swarm,” in Proc 2010 IEEE Congress on Evolutionary Computation (CEC) , 2010, pp.1-5.

[10] A. Oliveira, “Decision-making theories and models – A discussion of rational and psychological decision-making theories and models: The search for a cultural-ethical decision-making model,” in Electronic Journal of Business Ethics and Organization Studies (EJBO), 2007, vol.12, no.2, pp.12-17.

[11] F. Saiti, A.A. Naini, M.A. Shoorehdeli, M. Teshnehlab, "Thyroid Disease Diagnosis Based on Genetic Algorithms Using PNN and SVM," in Proc 3rd International Conference on Bioinformatics and Biomedical Engineering (ICBBE), 2009, pp.1-4.

[12] H. Li, K. Zhang, T. Jiang, "Minimum entropy clustering and applications to gene expression analysis," in Proc 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004), 2004, pp. 142- 151.

Figure 11. Entropy of different algorithms on different datasets

Documents

Method for increasing the computation speed of an ... · Keywords- Particle Swarm Optimization; Clustering; Statistical Analysis; Complexity Analysis. I. INTRODUCTION Clustering can