A tabu search algorithm for cluster building in wireless sensor networks

A Tabu Search Algorithm for Cluster Buildingin Wireless Sensor Networks

Abdelmorhit El Rhazi and Samuel Pierre, Senior Member, IEEE

Abstract—The main challenge in wireless sensor network deployment pertains to optimizing energy consumption when collecting data

from sensor nodes. This paper proposes a new centralized clustering method for a data collection mechanism in wireless sensor

networks, which is based on network energy maps and Quality-of-Service (QoS) requirements. The clustering problem is modeled as a

hypergraph partitioning and its resolution is based on a tabu search heuristic. Our approach defines moves using largest size cliques in

a feasibility cluster graph. Compared to other methods (CPLEX-based method, distributed method, simulated annealing-based

method), the results show that our tabu search-based approach returns high-quality solutions in terms of cluster cost and execution

time. As a result, this approach is suitable for handling network extensibility in a satisfactory manner.

Index Terms—Wireless sensor network, energy map, data collect, clustering methods, tabu search.

Ç

1 INTRODUCTION

INCREASINGLY, several applications require the acquisition

of data from the physical world in a reliable and automatic

manner. This necessity implies the emergence of new kinds

of networks, which are typically composed of low-capacity

devices. Such devices, called sensors, make it possible to

capture and measure specific elements from the physicalworld (e.g., temperature, pressure, humidity). Moreover,

they run on small batteries with low energetic capacities.

Consequently, their power consumption must be optimized

in order to ensure increased lifetime for those devices.

During data collection, two mechanisms are used to reduce

energy consumption: message aggregation and filtering of

redundant data. These mechanisms generally use clustering

methods in order to coordinate aggregation and filtering.Clustering methods belong to either one of two cate-

gories: distributed and centralized. The centralized ap-

proach assumes that the existence of a particular node is

cognizant of the information pertaining to the other net-

work nodes. Then, the problem is modeled as a graph

partitioning problem with particular constraints that render

this problem NP-hard. The central node determines clusters

by solving this partitioning problem. However, the major

drawbacks of this category are linked to additional costs

engendered by communicating the network node informa-

tion and the time required to solve an optimization

problem. In the second category, the distributed method,

each node executes a distributed clustering algorithm [7],

[14], [15], [16]. The major drawback of this category is that

nodes have limited knowledge pertaining to their neighbor-

hood. Hence, clusters are not built in an optimal manner.In [4], Ghiasi et al. propose centralized clustering for

sensor networks. They model this problem as a k-means

clustering problem, which is defined as follows [11]: let P be aset of n data points in d-dimensional spaceRd and an integerk, and the problem consists of determining a set of k points

in Rd, called centers, to minimize the mean squared distancefrom each data point to its nearest center. Heinzelman et al.[7] propose a centralized version of Low Energy AdaptiveClustering Hierarchy (LEACH), their data collection proto-

col, in order to produce better clusters by dispersing clusterhead nodes throughout the network. In this protocol, eachnode sends information regarding its current location and

energy level to the sink node, which computes the node’smean energy level, and nodes, whose energy level is inferiorto this average, cannot become cluster heads for the currentround. Considering the remaining nodes as possible cluster

heads, the sink node finds clusters using the simulatedannealing algorithm [1] in order to find optimal clusters.This algorithm attempts to minimize the amount of energy

required for noncluster head nodes to transmit their data tothe cluster head, by minimizing the sum of squareddistances between all noncluster head nodes and the closestcluster head.

The energy map, the component that holds informationconcerning the remaining energy available in all network

areas, can be used to prolong the network’s lifetime [7]. Intheir probabilistic model for energy consumption,Heinzelman et al. [7] claim that each sensor node can be

modeled by a Markov chain. They provide an equation thatcan be used by each node to calculate its energy dissipationrate, ET , for the next T time steps. With the remainingenergy, the value ET can be sent to the sink node for energy

map building purposes.This paper proposes a new centralized clustering

mechanism equipped with energy maps and constrainedby Quality-of-Service (QoS) requirements. Such a clustering

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009 433

. The authors are with the Mobile Computing and Networking ResearchLaboratory (LARIM), Department of Computer Engineering, �EcolePolytechnique de Montreal, C.P. 6079, Station succ. Centre-ville,Montreal, QC H3C 3A7, Canada.E-mail: {abdelmorhit.el-rhazi, samuel.pierre}@polymtl.ca.

Manuscript received 13 Aug. 2007; revised 9 Apr. 2008; accepted 20 Aug.2008; published online 4 Sept. 2008.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2007-08-0243.Digital Object Identifier no. 10.1109/TMC.2008.125.

1536-1233/09/$25.00 � 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

mechanism is used to collect data in sensor networks. Thefirst original aspect of this investigation consists of addingthese constraints to the clustering mechanism that helps thedata collection algorithm in order to reduce energy con-sumption and provide applications with the informationrequired without burdening them with unnecessary data.Centralized clustering is modeled as hypergraph partition-ing. The novel method proposes the use of a tabu searchheuristic to solve this problem. The existing centralizedclustering methods cannot be used to solve this issue due tothe fact that our approach to model the problem assumesthat the numbers of clusters and cluster heads are unknownbefore clusters are created, which constitutes another majororiginal facet of this paper.

The remainder of this paper is organized as follows:

Section 2 summarizes the data collection mechanism.

Section 3 outlines the problem formula. Section 4 describes

the tabu search adaptation. Computational experiments and

results are reported in Section 5. Section 6 concludes this

paper and delineates some of the remaining challenges.

2 DATA COLLECTION MECHANISM

Generally, sensor networks contain a large quantity of

nodes that collect measurements before sending them to the

applications. If all nodes forwarded their measurements,

the volume of data received by the applications would

increase exponentially, rendering data processing a tedious

task. A sensor system should thus contain mechanisms that

allow the applications to express their requirements in

terms of the required quality of data. Data aggregation and

data filtering are two methods that reduce the quantity of

data received by applications. The aim of those two

methods is not only to minimize the energy consumption

by decreasing the number of messages exchanged in the

network but also to provide the applications with the

needed data without needlessly overloading them with

exorbitant quantities of messages.

The aggregation data mechanism allows for the gathering

of several measures into one record whose size is less than

the extent of the initial records. However, the resultsemantics must not contradict the initial record semantics.

Moreover, it must not lose the meanings of the initial

records. The data filtering mechanism makes it possible to

ignore measurements considered redundant or those irrele-

vant to the application needs. A sensor system provides the

applications with the means to express the criteria used to

determine measurement relevancy, e.g., an application

could be concerned with temperatures, which are 1) lower

than a given value and 2) recorded within a delimited zone.

The sensor system filters the network messages and

forwards only those that respect the filter conditions.

Applications that use sensor networks are generally

concerned with the node measurements within a certain

period of time. Hence, the most important key indicators in

sensor networks are the quality of the measurements and

the network lifetime. An application designed to record the

mean temperature in zones where the sensors are deployed

could be associated with a set of requirements in terms of

measured frequencies (e.g., the sensor system must record a

measurement every 15 minutes), in terms of a measurement

discrepancy thresholds (e.g., the sensor system must ignore

data whose result is less than 10 percent of the previous

value), and in terms of the sensor lifetime (e.g., measure-

ments must be provided for one year).In [3], we propose a novel data collection approach for

sensor networks that use energy maps and QoS require-ments to reduce power consumption while increasingnetwork coverage. The mechanism comprises two phases:during the first phase, the applications specify their QoSrequirements regarding the data required by the applica-tions. They send their requests to a particular node S, calledthe collector node, which receives the application query andobtains results from other nodes before returning them tothe applications. The collector node builds the clusters,optimally using the QoS requirements and the energy mapinformation. During the second phase, the cluster headsmust provide the collector node with combined measure-ments for each period. The cluster head is in charge ofvarious activities: coordinating the data collection within itscluster, filtering redundant measurements, computingaggregate functions, and sending results to a node collector.

3 PROBLEM FORMULATION

The considered network contains a set V of m stationarynodes whose localizations are known. The communicationmodel can be described as multihop, which means thatcertain nodes cannot send measurements directly to thecollector node: they must rely on their neighbors’ service.An application can specify the following QoS requirements:

1. Data collection frequency, fq. The network providesresults to the application every time the duration fqexpires.

2. A measurement uncertainty threshold, mut. If thedifference between two simultaneous measurementsfrom two different nodes in the same zone (fourthrequirement) is inferior to mut, then one of them isconsidered redundant.

3. A query duration, T . The network required for thequery run a total time whose value is equal to T .

4. A zone size step. The step value determines the zonelength. Within a single zone, measurements areconsidered redundant. If an application requiresmore precision, it could decrease the step value oreven ignore the transfer of such value.

The goal of the clustering algorithm is to 1) split thenetwork nodes into a set of clusters Gi that satisfies theapplication requirements, 2) reduce energy consumption,and 3) prolong the network lifetime. Clusters are builtaccording to the following criteria:

. Maximize network coverage using the energy map;

. Gather nodes likely to hold redundant measurements;

. Gather nodes located within the same zone delim-ited by the application.

Based on those criteria, a cluster building problem (CBP)in the remainder of this paper consists of determining theset Gi that fulfills the following conditions:

434 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009

1: ð[iGi ¼ V Þ ^ ð\

iGi ¼ �Þ; ð1Þ

2: 8Nj;Nl 2 Gi MjðtÞ �MlðtÞ�� mut; ð2Þ

3: 8Nj;Nl 2 Gi dj;l � step; ð3Þ4: 9Nj 2 Gi ET

j � Eremainingj : ð4Þ

Here, Gi consists of a cluster that contains a set of nodesand a particular node that represents the cluster head. MjðtÞrepresents Nj node reading during the time slot t; dj;lcorresponds to the distance between nodes Nj and Nl; E

Tj is

equal to the estimated energy dissipation of node Nj duringperiod T ; Eremaining

j illustrates the energy remaining in nodeNj when the cluster building algorithm starts running.

The first condition makes it possible to structure nodesinto disjoined sets. The second condition permits thegathering of nodes that are likely to record redundantmeasurements that will be filtered by the network nodes.This condition can be verified using a current node measure-ment or previously taken mean measurements. In this case,the sensor node must be able to store measurement means.The third condition compels the gathering of nodes located inzones whose lengths are determined by the step value that isgenerated by the application. The fourth condition ensuresthat each cluster contains at least one node that guarantees thecoverage of the entire zone during the query run time.

CBP is considered as a hypergraph partitioning problem.The network nodes are modeled on a hypergraph with avertex set V ¼ f1; . . . ;mg. The arcs belong to a set ofclusters G ¼ fG1; . . . ; Gng. Additionally, let us define

. ci: the cost of cluster Gi;

. dj;s: the distance between node j and the collectornode s;

. ai;j ¼ 1: if cluster Gj contains node i, 0 otherwise;

. ej ¼ 1: if the energy level of node j satisfiesETj � E

remainingj , 0 otherwise.

The binary decision variables are given as follows:

. xj¼1 if cluster j is used for partitioning, 0 otherwise.

Hence, the CBP formulation can be expressed as

minimizeXnj¼1

cjxj

!ð5Þ

subject to

Xnj¼1

aijxj ¼ 1; for i ¼ 1; . . . ;m; ð6Þ

xj 2 f0; 1g; for j ¼ 1; . . . ; n; ð7ÞMbðtÞ �McðtÞj j � mut; 8b; c 2 Gj; for j ¼ 1; . . . ; n; ð8Þdb;c � step; 8b; c 2 Gj; for j ¼ 1; . . . ; n; ð9ÞXmi¼1

aijei � 1; for j ¼ 1; . . . ; n: ð10Þ

Equation (6) ensures that each node is included in asingle cluster; this is called a partitioning constraint.Equations (8), (9), and (10) represent the cluster buildingcriteria (2), (3), and (4), respectively.

The objective of a cluster building phase is to minimizeenergy dissipation when collecting node data. The cost cj of

a cluster Gj should reflect this objective. The cost should becomposed of two major terms. The first one represents theenergy consumption due to the cluster head duties. Indeed,

it is responsible for data collection and aggregation, as wellas the transmission of the measures of its cluster. Thesecond term represents the energy gained due to the factthat messages of other cluster nodes will be filtered by the

data collection mechanism.Generally, the energy consumed by a node for a single

cycle is expressed as follows [8]:

Ecycle ¼ ED þ ES þET þER; ð11Þ

where ED, ES , ET , and ER represent the energy required fordata processing, sensing, transmitting, and receiving percycle time, respectively. The quantity of energy spent foreach operation depends on the network and the event

model. Roughly, (11) is approximated by the preponderantterm that is ET [9]. When sending 1 bit from node u to v, theenergy consumed is expressed as follows [2]:

Euv ¼ Etxelec þ "ampd�uv � � 2:0: ð12Þ

Here, factor � indicates the path loss exponent and relies

on the communication channel as well as environmentalconditions. Etxlec depicts the energy dissipated by theelectronic transmitter, and "amp denotes a constant parameterthat characterizes the transmitter amplification.

The communication model considered consists of a

multihop model. Consequently, to express the energyconsumed by communicating node i measurements to sinknode s, the equation must consider the communication

links between each pair of nodes found on the path betweeni and s. However, such a complex equation would bedifficult to compute as it depends on the routing protocol.For this reason, only the distance that separates nodes i and

s is considered. This simplification is justified by the factthat, in a dense network, routing protocols can find a pathbetween i and s, which is similar to a line joining i and s onthe one hand. On the other hand, this simplification is also

valid in single-hop communication models.Consequently, the cost cj of cluster Gj is expressed as

follows:

cj ¼ �d�j;s � �Xnji¼1

d�i;s; ð13Þ

where nj indicates the number of nodes included incluster Gj, which is not currently a cluster head. dj;srepresents the distance between the cluster head Gj andsink node; � and � reflect two positive coefficients. The first

term of (13) represents the estimated energy consumed bythe cluster head to communicate the collective measure-ments. The second term represents the total energy savedby cluster Gj nodes by filtering their messages.

The problem thus consists of minimizingPn

j¼1ð�d�j;s ��Pnj

i¼1 d�i;sÞxj given (6)-(10). When formulated that way, the

problem is considered NP-hard since it is modeled as apartitioning problem with additional constraints and it is

known that the partitioning problem is NP-hard [6].Consequently, CBP cannot be resolved by a polynomial

EL RHAZI AND PIERRE: A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS 435

algorithm. This explains why a tabu search heuristic was

adopted in order to find the best solution.

4 A TABU SEARCH APPROACH

In order to facilitate the usage of tabu search for CBP, a newgraph called Gr is defined. It is capable of determiningfeasible clusters. A feasible cluster consists of a set of nodes

that fulfill the cluster building constraints (8), (9), and (10).Nodes that satisfy Constraint (10), i.e., ensure zone cover-age, are called active nodes. The vertices of Gr represent the

network nodes. An edge ði; jÞ is defined in graph Gr

between nodes i and j if they satisfy Constraints (8) and (9).Consequently, it is clear that a clique in Gr embodies afeasible cluster. A clique consists of a set of nodes that are

adjacent to one another.Five steps should be conducted in order to adapt tabu

search heuristics to solve a particular problem:

1. design an algorithm that returns an initial solution,2. define moves mð�Þ that determine the neighborhood

NðsÞ of a solution s,3. determine the content and size of tabu lists,4. define the aspiration criteria,5. design intensification and diversification

mechanisms.

The algorithm ends when one of the following three

conditions occurs:

1. All possible moves are prohibited by the tabu lists;2. The maximal number of iterations allowed has been

reached;3. The maximal number of iterations, where the best

solution is not enhanced successively, has beenreached.

4.1 Initial Solution

The goal is to find an appropriate initial solution for theproblem, in order to get the best solution from tabu search

iterations within a reasonable delay. The algorithm depictedin Fig. 1 is proposed. It starts sorting active nodes accordingto their degree in graph Gr decreasingly. For each iteration,the first active node i, not yet covered by the initial solution

F0, is selected. The algorithm determines the largest sizeclique that contains the selected active node i with its

adjacent nodes in graph Gr, which have yet to be covered byF0. This clique is considered a new cluster and node i

becomes the cluster head.The algorithm does not ensure that all nonactive nodes

are assigned to a cluster. Consequently, if node i is notcovered by any cluster when the algorithm ends, it isassigned to a cluster whose head is adjacent to node i.However, this leads to the fact that an initial solution couldnot be feasible, i.e., nodes made up of at least one cluster doesnot consist of a clique in the graph Gr. A penalty equation toevaluate a solution is proposed in the following sections.

4.2 The Neighborhood NðsÞ Definition

The definition of the neighborhood NðsÞ of a solution s is acrucial step as it determines the final quality of the solutionand has a direct impact on the execution time. Two types ofmoves are distinguished: the first move involves anordinary node, i.e., a nonactive node, and the second moveinvolves an active node. This is due to the fact that an activenode could be a cluster head and thus build a new cluster.Furthermore, the third move that involves a cluster headand allows removing an existing cluster from a solution isalso considered.

1. A Move Involving a Regular Node. Let s representthe solution analyzed for each iteration. Solution sconsists of a set of clusters. Let a be a regular node.The first type of moves mðs; aÞ is defined as follows:assume that node a is assigned to cluster Gi in s.Move mðs; aÞ assigns node a to another clusterGj ðGi 6¼ GjÞ whose head is adjacent to node a andremoves it from cluster Gi.

2. A Move Involving an Active Node. This secondtype of move relies on the fact that an active nodecould be a cluster head. Let a represent an activenode. Move mðs; aÞ consists of

a. Reassigning node a to a cluster whose head isadjacent to a. This move is similar to those of thefirst type, since an active node could be includedin a cluster without becoming its head.

b. Select node a to become the head of the cluster towhich it is assigned. The previous cluster headwill still be assigned to this cluster, although it isnot the cluster head. Consequently, the cost ofthis cluster will be affected as it relies on thehead coordinates.

3. A Move Involving a Cluster Head. The third type ofmove involves the head of an existing cluster. Let abe a head of cluster Gi, which is empty, i.e., Gi

contains only its head a. Move mðs; aÞ consists ofremoving cluster Gi from solution s and assigningnode a to a cluster whose head is adjacent to a.

These three types of moves engender a variety ofsolutions by producing several combinations of clustersand cluster heads. Also, they create solutions whose sizesvary. However, they can produce clusters that do notnecessarily consist of a clique since node a is reassignedwithout verifying whether the resulting cluster is a cliqueor not. A cluster penalty is defined and added to thecluster cost in order to compare NðsÞ solutions. Node a


Fig. 1. Initial solution algorithm.

penalty Pa assigned to cluster Gi represents the number ofnodes in Gi, which is not adjacent to node a. Conse-quently, function f 0 makes it possible to compare theelements of neighborhood NðsÞ. It is expressed as follows:

f 0 mðs; �Þð Þ ¼Xjsjj¼1

cjxj þ �P�: ð14Þ

In this formula, �, called the penalty coefficient, happensto be positive. More details will be provided further in thispaper. jsj represents the number of clusters in solution s.Function f 0ðmðs; aÞÞ contains two parts: the first is equal tothe total of cluster costs and the second term represents thepenalty caused by the move. For each iteration, this functionis used to compare NðsÞ solutions. To perform thisevaluation quickly, it suffices to calculate the gain of asolution in NðsÞ compared to solution s without recalculat-ing the value of f 0 for each iteration. This is possible sincemove mðs; aÞ affects only a maximum of two clusters.Hence, we define the gain Gainðmðs; aÞÞ associated withmove mðs; aÞ as follows:

1. If move mðs; aÞ consists of reassigning node a fromcluster Gi to cluster Gj, the gain is

Gain mðs; aÞð Þ ¼ �ðPaj � PaiÞ: ð15Þ

Paj represents the penalty for assigning node a tocluster Gj. The cluster costs disappear from (15) asthey will be mutually neutralized.

2. If move mðs; iÞ consists of selecting node i as thehead of the cluster Gj, where it is assigned, the gainis expressed as follows:

Gain mðs; iÞð Þ ¼ �d�i;s � �d�t;s� �

� �d�t;s � �d�i;s� �

: ð16Þ

Node t denotes the current head of cluster Gj. Theresult is obvious if we consider (13) and (14). Thepenalty values are identical as node i does notchange clusters.

4.3 Tabu List and Aspiration Criteria

Occasionally, tabu search methods accept solutions that donot improve the objective function, in the hope of reachingimproved solutions in the future. However, acceptingsolutions that are not necessarily optimal introduces a cyclerisk, i.e., a return to previously considered solutions, hencethe idea of keeping a tabu list, to keep track of the solutionsthat have been considered in the past. Thus, whengenerating the neighborhood candidates, the solutions thatappear in the tabu list are removed. Our adaptationproposes two tabu lists: a reassignment list and a reelectionlist. The first tabu list prevents cycles that can be generatedby the reassigning of a node to the same cluster. After eachmove mðs; aÞ, which consists of reassigning node a tocluster Gi, the pair (a, head of Gi) is added to this tabu list.The second tabu list prevents the reelection of an activenode in the same cluster. After a move mðs; aÞ, consisting ofelecting node a in cluster Gi, two pairs of nodes are addedto the reelection list: the first pair (a, head of Gi) prohibitsthe move mðs; aÞ and the second pair (head of Gi, a)prevents the reverse move. The reassignment of the tabu list

is initialized by the pairs that represent the initial solutionbefore starting the iterations. Such a strategy prevents thereturn to the initial solution.

Using a tabu list could drastically restrict the neighbor-hood NðsÞ. Moreover, it could miss certain attractivesolutions. Consequently, tabu search methods allow theviolation of tabu list rules through the definition of anaspiration criterion. Our proposal opts for the most usedaspiration criterion, which consists of considering a moveinventoried in the tabu list, which in turn, engenders asolution that is superior to the best solution found in thefirst place.

4.4 Diversification and Intensification

Diversification and Intensification are two mechanisms thatmake it possible to improve tabu search methods. They startby analyzing the appropriate solutions visited and obtaintheir common properties in order to be able to intensify thesearch in another neighborhood or to diversify the searches.In tabu searches, this mechanism is called long-term memory.Our proposal uses a technique called the shifting penalty tactic,which is an instance of a procedure called strategic oscillation,representing one of the basic diversification approaches fortabu searches [5]. The approach consists of directing thesearch toward and away from selected boundaries offeasibility, either by manipulating the objective function(e.g., with penalties or incentives, as the case may be) orsimply by compelling the choice of move that leads the searchin specified directions. In this particular case, the penaltycoefficient is determined dynamically, according to theviolation of the clique constraint. Thus, if the solutionselected at the end of an iteration contains a cluster withnonnull penalties, i.e., the cluster does not consist of a clique,the penalty coefficient value is increased and vice versa.

5 COMPUTATIONAL EXPERIENCE AND RESULTS

In order to evaluate the performance of these novelalgorithms, they were implemented with the use of C++and the Boost Graph Library (BGL) [10] and tested withsensor networks of different sizes and topologies. On theone hand, several experiments were conducted to evaluatethe impact of the tabu search method parameters. On theother hand, another approach was devised to solve thepartitioning problem based on CPLEX [18] in order tocompare the quality of the solutions found by tabu search.This new method had to be designed and implemented asthe existing approach for clustering problems cannot beused in this case, as explained at the beginning of thispaper. The algorithms run on a 1.80-GHz Pentium 4,equipped with a Linux server (Red Hat 3.4.4-2), a 1-Gbytememory, and an Intel processor.

5.1 Analysis of the Impact of Tabu SearchParameters

The size of the tabu list has a direct impact on the quality ofthe solution. Hence, it is important to analyze its impact, inorder to adjust its value accordingly. Results reported in [5]show that determining the tabu list size dynamically ismore efficient than fixing its value during the iterations.This experiment involves a sensor network composed of


100 nodes. A square topology is used, i.e., nodes arise on the

summits of squares that cover the entire network area. Tofacilitate the analysis, it is assumed that all nodes are active.

Only the size of the tabu list varies, and the values of all

other parameters are set, e.g., the maximum number of

iterations allowed is set to 1,000. Fig. 2 illustrates the results

of the investigation.Results corroborate with those described in [5]. Indeed,

the best results are obtained using tabu lists whose size is

similar to the number of nodes, i.e., between 75 and 120.

Results associated with a tabu list of large values hinder the

quality of the solution as they reduce the number ofelements visited by interdicting additional moves. More-

over, low values also hinder quality due to the cycle

generation. Thus, we determine that it is best to set the size

of the tabu list dynamically within the interval ½0:75N; 1:1N �(N indicates the number of nodes).

A second experiment was conducted in order to quantify

the impact of the parameter “maximum number of allowed

iterations” on the quality of the solution found and

determine the limits of the solution enhancement. A sensor

network that comprises 1,000 nodes localized in a square

topology is considered. The maximum number of iterations

is modified and the other parameters are set except for the

“maximum number of iterations where the best solution is

not enhanced” parameter, which takes a value that allows

the algorithm to reach the maximum number of iterations.

Fig. 3 shows the impact of this parameter on the percentage

of the improvement of the initial solution cost.


Fig. 2. Impact of the tabu list size on the solution cost.

Fig. 3. Impact of the maximum number of iterations on the solution cost.

Results show that by increasing the maximum number ofiterations, the solution costs is enhanced; hence, higherquality solutions are obtained. This occurs until thisparameter reaches a value where solutions have almostidentical cost in spite of the increasing number of iterations.Indeed, between 0 and 5,000 iterations, the solution qualityis enhanced by a factor of 2.3. However, between 5,000 and10,000 iterations, such enhancement is evident only1.35 times. This is mainly due to the influence of the otherparameters and the initial solution’s algorithm, whichyields a satisfactory solution.

The analyses regarding the impact of the maximalnumber of iterations on the execution time and the quantityof moves yield the same results. Fig. 4 shows such an impact.Within the range of 800 to 5,000 iterations, the execution timeincreases 10-fold and the quality of the solution is enhancedefficiently, i.e., solution costs are reduced by 40 percent.Thus, we conclude that the quality of the solution isenhanced by increasing the maximum number of iterationsuntil a value is reached where the execution time increaseswithout enhancing the solution efficiently. This value, whichwe will be using in the following experiments, is equal toapproximately eight times N .

5.2 Comparing Tabu Search Approach withCPLEX-Based Method

As stated previously, none of the analyzed clusteringmechanisms can be compared with our approach. Conse-quently, we devised and implemented a second method tosolve the CBP clustering problem. The objective is tocompare the quality of the solution found with the firstapproach. This new method is based on CPLEX to solve theproblem. CPLEX could not be used to directly solve thepartitioning problem, since the number of feasible clustersis quite significant and CBP contains saturated constraints.For these reasons, this new method finds an optimalcovering of the hypergraph and uses this covering to findthe best partitioning of the hypergraph. Indeed, the methodcontains three steps. In the first step, a set of feasible

clusters is defined, using the graph Gr. The second step,called the optimization phase, makes it possible to returnoptimal covering of the hypergraph using CPLEX. The laststep, called the postoptimization phase, aims to use the resultsof the previous step to find hypergraph partitioning. Theexperiments run on the server described in the previoussection, using CPLEX 10.1 and considers square topologysensor networks in which all nodes are active and QoSparameters are set (i.e., the values of step and mut). TheN number of nodes varies and the maximum number ofiterations is set at 8�N .

Fig. 5 illustrates the solution costs when varying thenumber of nodes for both methods (tabu search andCPLEX). It reveals that the solutions found by these twomethods are rather similar except for cases where thenumber of nodes is superior to 700 nodes, at which point thetabu search solutions generate better results. This is due tothe use of the postoptimization phase in the second method,which does not necessarily return an optimal solution. Also,the first step limits the considered solution sets.

Fig. 6 illustrates the execution time as the number ofnodes for both methods varies. It also reveals that theexecution times of both methods are highly similar fornetworks of fewer than 700 nodes. However, for this size,the execution time of the method based on CPLEX increasessignificantly and the execution time of the tabu searchincreases reasonably. This is mainly due to the fact thatsimplex, used by CPLEX in this case, consists of anexponential method.

Consequently, the results of this experiment show that,on the one hand, the solutions returned by our methodbased on tabu search are associated with high quality interms of cluster costs and execution time. On the otherhand, this approach behaves well with the networkextensibility, i.e., the execution time increases in a satisfac-tory manner as network size augments.

The aforementioned experiments were conducted usinga square topology. In order to validate our results withother topologies, the same study was conducted using a


Fig. 4. Impact of the maximum number of iterations on the execution time.

random topology where nodes are randomly localized overa 1-km2 surface. The same parameter values as in squaretopology are used, e.g., the same value of step and mut, andthe maximum number of iterations allowed is set at 8�N .Node localization is distributed in a uniform manner.Random reel numbers are generated using the Boost library[17]. Node density varies from 20 to 900 nodes/km2.

Fig. 7 shows solution costs when varying the nodedensity for the method based on tabu search and themethod based on CPLEX. Results are similar to thoseobtained with the square topology. Indeed, the solutioncosts for both methods are almost identical. Analyzing theexecution times of both methods reveals the same conclu-sion. Fig. 8 illustrates the execution time as the node densityvaries for both methods. Results show that the executiontime increases slightly for the tabu search-based method,contrary to the CPLEX-based method, where the executiontime augments drastically. This is justified with the samereasons as the square topology.

Such investigations allow for the validation of theresults obtained using square topologies. On the onehand, the solutions returned by the method based on tabusearch provide high quality in terms of cluster cost andexecution time. On the other hand, the method based ontabu search behaves well with network extensibility. Thisalso applies to networks that support square or randomtopologies.

5.3 Comparing Centralized Approach, DistributedApproach, and TAG

We devise the centralized approach for the CBP because wenoticed from our previous experiences described in [3] thatthe distributed approach does not ensure an optimalsolution. We conducted other experiences in order toevaluate the difference between the performances of thecentralized and distributed approaches.

Fig. 9 depicts solution costs when varying the networksize for the centralized and distributed approaches. Results


Fig. 6. Comparing execution time (tabu search and CPLEX): Square topology.

Fig. 5. Comparing the solution costs (tabu search and CPLEX): Square topology.

show that effectively the costs of the clusters built using thecentralized approach are better than those of the clustersbuilt using the distributed approach. Also, the differencebetween the two costs becomes bigger when the networksize increases. For example, the distributed approach allowsincreasing the quality of the clusters by 500 percent in anetwork with 100 nodes. Consequently, the clusters built bythe centralized approach are better than those created bythe distributed approach. The reason that explains thisresult is that in the distributed approach the nodes haveinformation limited to their node in their neighbor, whereasthe centralized approach finds the better cluster among allthe possible combination.

In order to the build the clusters, the two approachesneed to communicate messages between the networknodes. Indeed, on the one hand, in the distributedapproach the nodes have to send the message that allowscreating the clusters (e.g., informing the other nodes withregard to the QoS required by the applications); on the

other hand, the nodes in the centralized approach have tosend their information to a central node that collects all ofthese information and runs the algorithm to build theclusters. The energy consumed by sending and receivingthese messages should not be neglected. Consequently, weconducted experiences to measure the energy consumed bythe two approaches in order to build the clusters.

In the centralized approach, the central node needs thefollowing information in order to run the algorithm of clusterbuilding: 1) the coordinates of each node in order to build thegraph Gr and calculate the cluster costs, 2) the measurementof each node in order to build the graph Gr, and 3) the activeflag of each node (i.e., the flag has to indicate whetherthe node is able to cover its zone or not).

We conducted several simulations using OMNET++ [19].We use the multihop network communication model.Fig. 10 shows the energy consumed to build the clustersby the centralized and distributed approaches. Resultsshow that the distributed approach needs less energy


Fig. 7. Comparing solution costs (tabu search and CPLEX): Random topology.

Fig. 8. Comparing execution time (tabu search and CPLEX): Square topology.

consumption than the centralized approach and the gapbetween these energies becomes bigger when the networksize increases. The reason behind this result is that thecentral node needs to generate a considerable number ofmessages in order to collect all the node information. Weconclude that the central approach is less efficient than thedistributed approach in the cluster building phase.

We conducted other simulations in order to comparethe total consumed energy (i.e., energy consumed bybuilding algorithms and the energy consumed during thedata collection phase) for the central and distributedapproaches. The results of our approaches are comparedto those generated by an existing data collection algorithm,called TAG [13]. This algorithm was chosen due to itsimportance in sensor networks. It is actually used for datacollection in TinyOS, which is the most common operatingsystem in sensor networks. Fig. 11 illustrates the totalenergy consumption as network sizes vary. For the threealgorithms, energy consumption increases proportionallyto the number of nodes, a natural phenomenon. However,the slope of the curve that represents our approaches is

much less steep than the one associated with the TAGalgorithm, which means that the node number increase hasless impact on our approach as compared to TAG. This isdue to the fact that our system can filter data sensed bynodes within the same zone much better than in TAG. Thesecond conclusion is that, when the data collection phase isconsidered, the performance of the central approach isbetter than the distributed approach due to the fact thatthe cluster building is more efficient in the centralapproach.

5.4 Comparing Tabu Search-Based to SimulatedAnnealing-Based Approaches

Simulated annealing is a probabilistic algorithmic approachto solve optimization problems. Kirkpatrick et al. [12] use itto solve combinatorial optimization problems. Simulatedannealing allows for a given optimization problem toaccept solutions that degrade cost; even if later, suchaccepted solutions will be ignored when they fail toimprove the best solution. Simulated annealing decideswhether to reject or accept a solution that degrades costs


Fig. 9. Comparing solution cost (distributed and centralized approaches).

Fig. 10. Comparing the energy consumption to build the clusters (centralized and distributed approaches).

randomly. The same query and topology were used tocompare the results obtained by tabu search and bysimulated annealing. Fig. 12 presents a comparison withsimulated annealing. Generally, the tabu search algorithmperforms better than the simulated annealing algorithm.

6 CONCLUSION

This paper has presented a heuristic approach based on atabu search to solve clustering problems where the numbersof clusters and cluster heads are unknown beforehand. Toour knowledge, this is the first time that the clusteringproblem is modeled and resolved with these constraints.The tabu search adaptation consists of defining three typesof moves that allow reassigning nodes to clusters, selectingcluster heads, and removing existing clusters. Such moves

use the largest size clique in a feasibility cluster graph,which facilitates the analysis of several solutions and makesit possible to compare them using a gain function.

The performance of this novel approach was evaluatedwith different network sizes and topologies. The perfor-mance is compared to that obtained by a second resolutionCPLEX-based method, a third approach based on simulatedannealing heuristic, and an existing algorithm (TAG).Finally, results show that a tabu search-based resolutionmethod provides quality solutions in terms of cluster costand execution time. Furthermore, it behaves well withnetwork extensibility. Nevertheless, compared to a distrib-uted approach, this centralized approach suffers from amajor drawback linked to the additional costs generated bycommunicating the network node information and the timerequired to solve an optimization problem.


Fig. 11. Comparing the energy consumption (centralized, distributed, and TAG approaches).

Fig. 12. Comparing the solution cost (tabu search and simulated annealing).

We conducted several experiences to compare theperformance of our central approach with those of adistributed approach and we conclude that the centralapproach is less efficient than the distributed approach inthe cluster building phase. Nevertheless, the centralapproach is more efficient in the data collection phase.Consequently, the central approach is more efficient in acase where the data collection phase is long. Otherwise, thedistributed approach should be used to run the queries witha short execution time.

REFERENCES

[1] P.K. Agarwal and C.M. Procopiuc, “Exact and ApproximationAlgorithms for Clustering,” Algorithmica, vol. 33, no. 2, pp. 201-226, June 2002.

[2] P. Basu and J. Redi, “Effect of Overhearing Transmissions onEnergy Efficiency in Dense Sensor Networks,” Proc. Third Int’lSymp. Information Processing in Sensor Networks (IPSN ’04), pp. 196-204, Apr. 2004.

[3] A. El Rhazi and S. Pierre, “A Data Collection Algorithm UsingEnergy Maps in Sensor Networks,” Proc. Third IEEE Int’l Conf.Wireless and Mobile Computing, Networking, and Comm. (WiMob ’07),2007.

[4] S. Ghiasi, A. Srivastava, X. Yang, and M. Sarrafzadeh, “OptimalEnergy Aware Clustering in Sensor Networks,” Sensors, pp. 258-269, 2002.

[5] F. Glover, E. Taillard, and D. Werra, “A User’s Guide to TabuSearch,” Annals of Operations Research, vol. 41, no. 14, pp. 3-28, May1993.

[6] M. Gondran and M. Minoux, Graphes et Algorithmes, second ed.Editions Eyrolles, 1985.

[7] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “AnApplication Specific Protocol Architecture for Wireless Micro-sensor Networks,” IEEE Trans. Wireless Comm., vol. 1, no. 4,pp. 660-670, Oct. 2002.

[8] J.J. Lee, B. Krishnamachari, and C.C.J. Kuo, “Impact of Hetero-geneous Deployment on Lifetime Sensing Coverage in SensorNetworks,” Proc. IEEE Sensor and Ad Hoc Comm. and Networks Conf.(SECON ’04), pp. 367-376, 2004.

[9] W. Liang and Y. Liu, “Online Data Gathering for MaximizingNetwork Lifetime in Sensor Networks,” IEEE Trans. MobileComputing, vol. 6, no. 1, pp. 2-11, Jan. 2007.

[10] S. Jeremy, The Boost Graph Library: User Guide and Reference Manual.Addison-Wesley, 2002.

[11] T. Kanugo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silver-man, and A.Y. Wu, “A Local Search Approximation Algorithm fork-Means Clustering,” Proc. 18th Ann. ACM Symp. ComputationalGeometry (SoCG ’02), pp. 10-18, 2002.

[12] S. Kirkpatrick, C.C. Gelatt Jr., and M.P. Vecchi, “Optimization bySimulated Annealing,” Science, vol. 220, pp. 671-680, 1983.

[13] S.R. Madden, M.J. Franklin, J.M. Hellerstein, and W. Hong, “TAG:Tiny Aggregation Service for Ad-Hoc Sensor Networks,” Proc.Fifth Symp. Operating Systems Design and Implementation (OSDI ’02),pp. 131-146, 2002.

[14] O. Moussaoui, A. Ksentini, M. Naimi, and M. Gueroui, “A NovelClustering Algorithm for Efficient Energy Saving in WirelessSensor Networks,” Proc. Seventh Int’l Symp. Computer Networks(ISCN ’06), pp. 66-72, 2006.

[15] S. Raghuwanshi and A. Mishra, “A Self-Adaptive ClusteringBased Algorithm for Increased Energy-Efficiency and Scalabilityin Wireless Sensor Networks,” Proc. IEEE 58th Vehicular TechnologyConf. (VTC ’03), vol. 5, pp. 2921-2925, 2003.

[16] O. Younis and S. Fahmy, “Distributed Clustering in Ad-HocSensor Networks: A Hybrid, Energy-Efficient Approach,” Proc.IEEE INFOCOM, pp. 629-640, 2004.

[17] http://boost.org/libs/random/index.html, Feb. 2007.[18] http://www.ilog.com/products/cplex/, Feb. 2008.[19] http://www.omnetpp.org/, Feb. 2008.

Abdelmorhit El Rhazi received the bachelor’sdegree in software engineering from the �EcoleMohammadia d’Ingenieur, Rabat, Morocco, in1995 and the master’s degree in computerengineering from the �Ecole Polytechnique deMontreal in 2003. He is currently with the MobileComputing and Networking Research Labora-tory (LARIM), Department of Computer Engi-neering, �Ecole Polytechnique de Montreal. Hiswork revolved around data collection and the

energy consumption in wireless sensor networks.

Samuel Pierre is currently a professor ofcomputer engineering at �Ecole Polytechniquede Montreal, where he is the director of theMobile Computing and Networking ResearchLaboratory (LARIM) and an NSERC/Ericssonindustrial research chair in next-generationmobile networking systems. He is the author orcoauthor of six books, 15 book chapters,16 edited books, and more than 350 othertechnical publications including journal and

proceedings papers. He received the Best Paper Award from the NinthInternational Workshop in Expert Systems and Their Applications(France, 1989), the Distinguished Paper Award from OPNETWORK2003 (Washington), a special mention from Telecoms Magazine(France, 1994) for one of his coauthored books, Telecommunicationset Transmission de Donnees (Eyrolles, 1992), among others. Hisresearch interests include wireline and wireless networks, mobilecomputing, performance evaluation, artificial intelligence, and electroniclearning. He is an associate editor of the IEEE Communications Letters,the IEEE Canadian Journal of Electrical and Computer Engineering, andthe IEEE Canadian Review. He is also a regional editor of the Journal ofComputer Science. He also serves on the editorial board of Telematicsand Informatics (Elsevier Science) and the International Journal ofTechnologies in Higher Education (IJTHE). He is a fellow of theEngineering Institute of Canada, a member of the Canadian Academy ofEngineering, and a senior member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.