14
Localized Multicast: Efficient and Distributed Replica Detection in Large-Scale Sensor Networks Bo Zhu, Member, IEEE, Sanjeev Setia, Sushil Jajodia, Senior Member, IEEE, Sankardas Roy, Member, IEEE, and Lingyu Wang, Member, IEEE Abstract—Due to the poor physical protection of sensor nodes, it is generally assumed that an adversary can capture and compromise a small number of sensors in the network. In a node replication attack, an adversary can take advantage of the credentials of a compromised node to surreptitiously introduce replicas of that node into the network. Without an effective and efficient detection mechanism, these replicas can be used to launch a variety of attacks that undermine many sensor applications and protocols. In this paper, we present a novel distributed approach called Localized Multicast for detecting node replication attacks. The efficiency and security of our approach are evaluated both theoretically and via simulation. Our results show that, compared to previous distributed approaches proposed by Parno et al., Localized Multicast is more efficient in terms of communication and memory costs in large-scale sensor networks, and at the same time achieves a higher probability of detecting node replicas. Index Terms—Wireless sensor networks security, node replication attack detection, distributed protocol, efficiency. Ç 1 INTRODUCTION A new set of security challenges arises in sensor networks due to the fact that current sensor nodes lack hardware support for tamper-resistance and are often deployed in unattended environments where they are vulnerable to capture and compromise by an adversary. A serious consequence of node compromise is that once an adversary has obtained the credentials of a sensor node, it can surreptitiously insert replicas of that node at strategic locations within the network. These replicas can be used to launch a variety of insidious and hard-to-detect attacks on the sensor application and the underlying networking protocols. This type of attack is called a node replication attack, which was first identified and studied by Parno et al. [14]. In a centralized approach for detecting node replication, when a new node joins the network, it broadcasts a signed message (referred to as a location claim) containing its location and identity to its neighbors. One or more of its neighbors then forward this location claim to a central trusted party [4] (e.g., the base station). With location information for all the nodes in the network, the central party can easily detect any pair of nodes with the same identity but at different locations. Like all centralized approaches, however, this solution is vulnerable to a single-of-point failure. If the base station is compromised or the path to the base station is blocked, adversaries can add an arbitrary number of replicas into the network without being detected. Hence, a distributed solution is desirable. Distributed approaches for detecting node replications are based on storing a node’s location information at one or more witness nodes in the network. When a new node joins the network, its location claim is forwarded to the corresponding witness nodes. If any witness receives two different location claims for the same node identity (ID), it will have detected the existence of a replica and can take appropriate actions to revoke the node’s credentials. The basic challenge of any distributed protocol in detecting node replicas is to minimize communication and per node memory costs while ensuring that the adversary cannot defeat the protocol. A protocol that deterministically maps a node’s ID to a unique witness node would minimize both communication costs and memory requirements per node, but would not offer enough security because the adversary would need to compromise just a single witness node in order to be able to introduce a replica without being detected. Previously, Parno et al. [14] presented two distributed algorithms for detecting node replication in which the witness nodes for a node’s location information are randomly selected among all the nodes in the network. In the Randomized Multicast algorithm each location has ffiffiffi n p witness nodes. Thus, in a network of n nodes, according to the Birthday Paradox, in the event of a node replication attack, at least one witness node is likely to receive conflicting location claims about a particular node. The communication costs of this protocol are Oðn 2 Þ (for the entire network) and the memory requirements per node are Oð ffiffiffi n p Þ. The Line- Selected Multicast exploits the routing topology of the network to select witnesses for a node’s location and uses IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010 913 . B. Zhu and L. Wang are with the Concordia Institute for Information Systems Engineering, Concordia University, 1515 Ste-Catherine Street West, Suite: EV007.639, Montreal, QC H3G 2W1, Canada. E-mail: {zhubo, wang}@ciise.concordia.ca. . S. Setia, S. Jajodia, and S. Roy are with the Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030. E-mail: {setia, jajodia}@gmu.edu, [email protected]. Manuscript received 31 July 2008; revised 30 July 2009; accepted 17 Oct. 2009; published online 23 Feb. 2010. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TMC-2008-07-0301. Digital Object Identifier no. 10.1109/TMC.2010.40. 1536-1233/10/$26.00 ß 2010 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Localised multicast efficient and distributed replica detection in large scale sensor networks

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Localised multicast efficient and distributed replica detection in large scale sensor networks

Localized Multicast: Efficientand Distributed Replica Detectionin Large-Scale Sensor Networks

Bo Zhu, Member, IEEE, Sanjeev Setia, Sushil Jajodia, Senior Member, IEEE,

Sankardas Roy, Member, IEEE, and Lingyu Wang, Member, IEEE

Abstract—Due to the poor physical protection of sensor nodes, it is generally assumed that an adversary can capture and

compromise a small number of sensors in the network. In a node replication attack, an adversary can take advantage of the credentials

of a compromised node to surreptitiously introduce replicas of that node into the network. Without an effective and efficient detection

mechanism, these replicas can be used to launch a variety of attacks that undermine many sensor applications and protocols. In this

paper, we present a novel distributed approach called Localized Multicast for detecting node replication attacks. The efficiency and

security of our approach are evaluated both theoretically and via simulation. Our results show that, compared to previous distributed

approaches proposed by Parno et al., Localized Multicast is more efficient in terms of communication and memory costs in large-scale

sensor networks, and at the same time achieves a higher probability of detecting node replicas.

Index Terms—Wireless sensor networks security, node replication attack detection, distributed protocol, efficiency.

Ç

1 INTRODUCTION

A new set of security challenges arises in sensor networksdue to the fact that current sensor nodes lack hardware

support for tamper-resistance and are often deployed inunattended environments where they are vulnerable tocapture and compromise by an adversary. A seriousconsequence of node compromise is that once an adversaryhas obtained the credentials of a sensor node, it cansurreptitiously insert replicas of that node at strategiclocations within the network. These replicas can be used tolaunch a variety of insidious and hard-to-detect attacks onthe sensor application and the underlying networkingprotocols. This type of attack is called a node replication attack,which was first identified and studied by Parno et al. [14].

In a centralized approach for detecting node replication,when a new node joins the network, it broadcasts a signed

message (referred to as a location claim) containing its locationand identity to its neighbors. One or more of its neighbors

then forward this location claim to a central trusted party [4](e.g., the base station). With location information for all thenodes in the network, the central party can easily detect any

pair of nodes with the same identity but at different locations.Like all centralized approaches, however, this solution

is vulnerable to a single-of-point failure. If the base station

is compromised or the path to the base station is blocked,adversaries can add an arbitrary number of replicas into thenetwork without being detected. Hence, a distributedsolution is desirable.

Distributed approaches for detecting node replicationsare based on storing a node’s location information at one ormore witness nodes in the network. When a new node joinsthe network, its location claim is forwarded to thecorresponding witness nodes. If any witness receives twodifferent location claims for the same node identity (ID), itwill have detected the existence of a replica and can takeappropriate actions to revoke the node’s credentials.

The basic challenge of any distributed protocol indetecting node replicas is to minimize communicationand per node memory costs while ensuring that theadversary cannot defeat the protocol. A protocol thatdeterministically maps a node’s ID to a unique witnessnode would minimize both communication costs andmemory requirements per node, but would not offerenough security because the adversary would need tocompromise just a single witness node in order to be able tointroduce a replica without being detected.

Previously, Parno et al. [14] presented two distributedalgorithms for detecting node replication in which thewitness nodes for a node’s location information arerandomly selected among all the nodes in the network. Inthe Randomized Multicast algorithm each location hasffiffiffinp

witness nodes. Thus, in a network of n nodes, accordingto the Birthday Paradox, in the event of a node replicationattack, at least one witness node is likely to receive conflictinglocation claims about a particular node. The communicationcosts of this protocol are Oðn2Þ (for the entire network) andthe memory requirements per node are Oð

ffiffiffinpÞ. The Line-

Selected Multicast exploits the routing topology of thenetwork to select witnesses for a node’s location and uses

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010 913

. B. Zhu and L. Wang are with the Concordia Institute for InformationSystems Engineering, Concordia University, 1515 Ste-Catherine StreetWest, Suite: EV007.639, Montreal, QC H3G 2W1, Canada.E-mail: {zhubo, wang}@ciise.concordia.ca.

. S. Setia, S. Jajodia, and S. Roy are with the Department of ComputerScience, George Mason University, 4400 University Drive, Fairfax, VA22030. E-mail: {setia, jajodia}@gmu.edu, [email protected].

Manuscript received 31 July 2008; revised 30 July 2009; accepted 17 Oct.2009; published online 23 Feb. 2010.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2008-07-0301.Digital Object Identifier no. 10.1109/TMC.2010.40.

1536-1233/10/$26.00 � 2010 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 2: Localised multicast efficient and distributed replica detection in large scale sensor networks

geometric probabilities to detect replicated nodes. It has acommunication cost of Oðn

ffiffiffinpÞ and memory requirements

per node of Oð ffiffiffinp Þ.Recently, Conti et al. proposed another replica detection

protocol, i.e., RED [2]. Compared to Parno et al.’s work [14],in RED each location has a smaller number of witnesses.The set of witnesses is uniformly chosen from the wholenetwork due to the usage of a pseudorandom function, theinputs of which include the identity of the node, thenumber of locations (of witnesses) that have to be generatedby any neighbor of this node that decides to forward thelocation claim, and a random number rand which ischanged per iteration. In other words, within each iteration,the set of witnesses for any node is fixed and is known toanyone who has the knowledge of rand through either nodecompromise or sniffing the broadcast message containingthe value of rand at the beginning of each iteration.Therefore, there exists a dilemma in selecting an appro-priate value of the number of locations (of witnesses) thathave to be generated so as to achieve the balance betweenefficiency and robustness against node compromise.

In this paper, we present a novel distributed protocol fordetecting node replication attacks that takes a differentapproach for selecting witnesses for a node. In our approach,which we call Localized Multicast, the witness nodes for anode identity are randomly selected from the nodes that arelocated within a geographically limited region (referred to asa cell). Our approach first deterministically maps a node’s IDto one or more cells, and then uses randomization within thecell(s) to increase the resilience and security of the scheme.One major advantage of our approach is that the probabilityof detecting node replicas is much higher than that achievedin Parno et al.’s protocols [14].

We describe and analyze two variants of the LocalizedMulticast approach: Single Deterministic Cell (SDC) andParallel Multiple Probabilistic Cells (P-MPC), which as theirname suggests differ in the number of cells to which alocation claim is mapped and the manner in which the cellsare selected. We evaluate the performance and security ofthese approaches both theoretically and via simulation. Ourresults show that the Localized Multicast approach is moreefficient than Parno et al.’s algorithms in terms of commu-nication and memory costs, while providing a high level ofcompromise-resilience. Further, our approach also achievesa higher level of security in terms of the capability ofdetecting node replicas.

The rest of the paper is organized as follows: In Section 2,we review previous research work related to detecting nodereplication in sensor networks. In Section 3, the system,network, and adversary model of our work are presented.Then, we propose two variants of the Localized Multicastapproach in Section 4. Afterwards, the theoretic analysis onthe security and efficiency of the Single Deterministic Cellscheme and the Parallel Multiple Probabilistic Cells schemeare presented in Section 5 and Section 6, respectively. Thesimulation results are shown in Section 7. Finally, we drawour conclusion in Section 8.

2 RELATED WORK

The methods of detecting node replication can be dividedinto two categories: centralized and distributed.

The general idea of centralized solutions was firstdescribed in [14]. More specifically, each sensor’s locationinformation is forwarded toward a centralized trustedparty, usually the base station, which takes the responsi-bility of identify repeated identities at distinct locations. Amore concrete protocol (i.e., SET [1]) was later proposed byChoi et al., is based on the idea of computing set operations(intersection and union) of exclusive subsets in the network.In SET, a distributed algorithm is performed to divide thenetwork into exclusive subsets and select subset leaders(SLDRs). Each exclusive set is securely formed among one-hop neighbors. Afterwards, in the basic scheme, each SLDRforwards a summarized report to the base station directly.In the subset-tree scheme, multiple subset trees, nodes ofwhich are SLDRs, are constructed. For each subset tree, aroot SLDR aggregates reports from other leaf SLDRs, andthen forwards the final report to the base station. Uponreceiving all the reports, the base station verifies the validityof the reports and detect node replicas.

Parno et al. [14] were the first to propose distributedalgorithms for detecting node replication attacks in sensornetworks. The authors first described two preliminaryapproaches, i.e., Node-to-Network Broadcasting and De-terministic Multicast, and discussed their weaknesses.Then, the Randomized Multicast and the Line-SelectedMulticast were proposed. In Sections 7 and 6.3, we havecompared the performance and effectiveness of ourapproaches to their schemes.

Recently, Conti et al. proposed a new distributed protocol,called as RED [2], for detecting node replication attacks.Compared to Parno et al.’s work [14], RED has a smallermemory overhead. In addition, since the set of witnesses ischosen uniformly within the network, RED is more robustagainst selective node compromise, although has a slightlower detection rate in terms of random node compromise.Their scheme can be viewed as a variant of deterministicmulticast, which has a weakness in determining an appro-priate number of deterministic witness nodes that satisfiesboth security and efficiency requirements [14]. In RED, thisweakness is mitigated through changing witness nodes forany given identity after each time interval, although they aredeterministic within any time interval.

An attack that is superficially similar to node replicationis the Sybil attack [3]. In this attack, single physicaladversary can generate a number of virtual identities andfalsely claim to be a set of nonexistent nodes. Douceur [3]proposed the use of a few schemes in which the potentialSybil users are challenged to solve some resource-intensivetask that can only be accomplished by multiple real-worldusers but will be impractical for a Sybil source. In contrast,in node replication attacks, single adversary can generate anumber of physical nodes with the same identity and putthem at different locations in the network. In other words,each replica is a real physical node, instead of a virtual one.As a result, the detection mechanism proposed in [3] fails todetect node replication. In [13], Newsome et al. proposed afew mechanisms for detecting Sybil attacks in sensornetworks, among which only the centralized node registra-tion mechanism can be used to detect node replication.

This paper extends an earlier version of the work [18] inimportant new ways. First, we add the discussion aboutpotential attacks against SDC, e.g., blocking attacks. Second,

914 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 3: Localised multicast efficient and distributed replica detection in large scale sensor networks

security analysis upon the resilience against node compro-mise are revisited to provide a more accurate analysis. Last,but not the least, the security and efficiency conditions ofour approach are evaluated under different settings.

3 PROTOCOL FRAMEWORK

In this section, we present the system, network, andadversary models assumed in our work, as well as thenotation and symbols used in the paper.

3.1 System and Network Model

We consider a sensor network with a large number of low-cost nodes distributed over a wide area. In our approach,we assume the existence of a trusted base station, and thesensor network is considered to be a geographic grid, eachunit of which is called a cell. Sensors are distributeduniformly in the network. New sensors may be added intothe network regularly to replace old ones.

Each node is assigned a unique identity and a pair ofidentity-based public and private keys1 by an offline Trust

Authority (TA). In identity-based signature schemes like [6],the private key is generated by signing its public key(usually a hash on its unique identity) with a master secretheld only by the TA. In other words, to generate a newidentity-based key pair, cooperation from the TA is a must.Therefore, we assume that adversaries cannot easily createsensors with new identities in the sense that they cannotgenerate the private keys corresponding to the identitiesclaimed and thus fail to prove themselves to the neighborsduring the authentication of the location claims.

Similar to [14], we require that, when a node is addedinto the network, it needs to generate a location claim andbroadcast the claim to its neighbors. Each neighborindependently decides whether to forward the claim witha given probability. For those neighbors that plan toforward the claim, they determine the destination cell(s)according to the output of a geographic hash function [15],which uniquely maps the identity of the sender of thelocation claim to one or a few of the cells in the grid. Then,the claim is forwarded to the destination cell(s) using ageographic routing protocol such as GPSR [7].

3.2 Adversary Model

In this paper, we assume that the major goal of adversariesis to launch node replication attacks. To achieve this goal,we assume that adversaries may launch both passiveattacks (e.g., eavesdropping on network traffic) and activeattacks (e.g., modifying and replaying messages or com-promising sensors), and the information obtained from theformer can be used to enhance the effectiveness of the latter.For example, by sniffing the traffic, adversaries may deducecertain information about the witness nodes, which couldhelp them evaluate the potential benefit of compromising agiven node and the risk of being detected while launchingthe node replication attack at a given location.

We assume the existence of some monitoring mechanismthat can detect a node compromising operation with acertain probability. We also assume that adversaries arerational, and thus, may try to avoid triggering anyautomated protocol (e.g., SWATT [17]) that sweeps thenetwork to remove compromised nodes, or drawing humanattention or intervention while launching the attacks.

3.3 Notation

In Table 1, we list the notation and symbols used in this paper.

4 THE LOCALIZED MULTICAST APPROACH FOR

DETECTING NODE REPLICATIONS

We have designed two variants of the Localized Multicastapproach, specifically Single Deterministic Cell (SDC) andParallel Multiple Probabilistic Cells (P-MPC).

4.1 Single Deterministic Cell

In the Single Deterministic Cell scheme, a geographic hashfunction [15] is used to uniquely and randomly map nodeL’sidentity to one of the cells in the grid. For example, given thatthe geographic grid consists of a� b cells, a cell at the a0th rowand the b0th column (where a0 2 f1; . . . ; ag; b0 2 f1; . . . ; bg) isuniquely identified as c (where c ¼ a0 � bþ b0). By using a one-way hash function HðÞ, node L is mapped to a cell C, wherec ¼ ½HðIDLÞmodða � bÞ� þ 1.

The format of the location claim is

½IDL; lL; SIGSKLðHðIDLklLÞÞ�;

where k denotes the concatenation operation and lL is thelocation information of L, which can be expressed usingeither the two-dimension or three-dimension coordinate.

When L broadcasts its location claim, each neighbor firstverifies the plausibility of lL (e.g., based on its location andthe transmission range of the sensor) and the validity of thesignature in the location claim. In identity-based signatureschemes [6], only a signature generated with the private key

ZHU ET AL.: LOCALIZED MULTICAST: EFFICIENT AND DISTRIBUTED REPLICA DETECTION IN LARGE-SCALE SENSOR NETWORKS 915

TABLE 1Notation and Symbols

1. Recent work [11], [5] shows that public key algorithms are practical onnew sensor hardware. In addition, similar to [14], we can use symmetric keycryptography instead to lower down the computation cost, at the cost oflarge communication overhead.

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 4: Localised multicast efficient and distributed replica detection in large scale sensor networks

corresponding to the identity claimed can pass the validationprocess. Thus, adversaries cannot generate valid signaturesunless they compromise the node with that identity.

Each neighbor independently decides whether to for-ward the claim with a probability pf . If a neighbor plans toforward the location claim, it first needs to execute ageographic hash function [15] to determine the destinationcell, denoted as C. The location claim is then forwardedtoward cell C.

Once the location claim arrives at cell C, the sensorreceiving the claim first verifies the validity of the signature,and then checks whether cell C is indeed the cellcorresponding to the identity listed in the claim messagebased on the geographic hash function. If both theverifications succeed, the location claim is flooded withincell C. Each node in the cell independently decides whetherto store the claim with a probability ps. Note that theflooding process is executed only when the first copy of thelocation claim arrives at cell C, and the following copies areignored. As a result, the number of witnesses in the cell w iss � ps on average, where s is the number of sensors in a cell.

Whenever any witness receives a location claim with thesame identity but a different location compared to apreviously stored claim, it forwards both location claimsto the base station. Then, the base station will broadcast amessage within the network to revoke the replicas.

Compared to the Random Multicast and Line-SelectedMulticast algorithms, a major advantage of SDC is that itensures 100 percent success rate for detecting any nodereplication, as long as the location claim is successfullyforwarded toward cell C and stored by at least one node inthe cell.

An important limitation on the Random Multicast andLine-Selected Multicast algorithms is that both the commu-nication/memory overhead and the security (in terms of thesuccess rate of detecting node replications) of the twoalgorithms are tightly related to the number of witnesses(w). On the one hand, the larger w is, the higher thecommunication and memory overhead. On the other hand,the smaller w is, the lower the success rate of detecting nodereplication. To ensure a high success rate of detecting nodereplication, w has to be Oð

ffiffiffinpÞ, where n is the number of

sensors in the network.In contrast, in the SDC scheme the communication cost

and memory overhead are related to the number ofneighbors that forward a location claim (i.e., r ¼ d � pf )and the number of the witnesses (i.e., w ¼ s � ps), respec-tively. In addition, the success rate of detecting nodereplication is independent of w when w � 1. Moreover, therandomization against potential node compromise and lowmemory overhead are achieved through flooding thelocation claim within the destination cell while storing iton only a small number of randomly chosen nodes.Assuming that the capability of the adversary (in terms ofthe number of nodes that can be compromised withoutbeing detected) is limited, by appropriately choosing thecell size (s) and ps, the probability that adversaries controlall the witnesses for an identity is negligible. Consequently,SDC can achieve a low communication cost by setting r to asmall value, and at the same time ensure low memoryoverhead and good security (i.e., a high success rate of

detecting node replication and high level of resilienceagainst potential node compromise), by choosing anappropriate value for w (s and ps, actually). A detailedanalysis of the security and efficiency achieved in SDC ispresented in Section 5.

4.2 Parallel Multiple Probabilistic Cells

4.2.1 Motivation

In this paper, we assume the existence of a monitoringmechanism that can detect a node compromising operationwith a certain probability. Therefore, the larger the numberof nodes that an adversary attempts to compromise, thehigher is the probability that the node compromising attackis detected, thereby triggering an automated protocol orhuman intervention for removing compromised nodes.However, in certain cases (e.g., when the number of nodesin a cell is relatively small), a determined adversary may bewilling to take the risk of being detected in return for a highprobability of controlling all the witness nodes for one ormore identities.

Another potential risk is that a smart adversary can takeadvantage of the knowledge that the destination cell for agiven identity is deterministic and launch a blocking attack.Informally, after compromising a small set of sensorsdenoted as V , the adversary can generate replicas ofmembers in V and deploy them in such a way that all thelocation claims of these replicas are forwarded throughmembers of V .

In the SDC approach, all the location claims are firstforwarded from the neighbors of L to a deterministic cell.Therefore, there is a high probability that these forwardingpaths intersect with each other. In particular, when L andthe destination cell (i.e., cell C) are far from each other, thereis a high probability that all the location claims will passthrough one or a small set of nodes of size y. Therefore, theadversary only needs to compromise one or y nodes perreplica so as to block the forwarding of a location claim.Hop-by-hop watchdog monitoring [12] may help mitigatethis attack. However, it will fail if all or most of theneighbors of an intersection point are compromised.

Even worse, the adversary can insert a replica in such away that its location claim will always be forwarded througha small set of compromised nodes. An example of blockingattack against the SDC approach is shown in Fig. 1. Cell C1

and C2 are the deterministic cells for the identity IDC1and

IDC2, respectively, and B is an area in which all the nodes

have been compromised (referred to as a black hole). In thisexample, three replicas (i.e., L1

C1, L2

C1, and L3

C1) claiming the

same identity that is mapped to cell C1 are added to thenetwork sequentially, with a certain time interval betweenany pair of consecutive joins. In the SDC approach, nodesenroute between the replica and the deterministic cell do notstore the location claim. As a result, as long as the locationclaims from different replicas do not arrive at the same time,forwarding nodes are not able to detect the conflicts. Finally,all the location claims are delivered to the black hole andblocked. In other words, adversaries can insert replicaswithout being detected. Note that the same black hole may beused to insert replicas for multiple identities. As shown inFig. 1, two replicas (i.e., L1

C2and L2

C2) claiming the same

identity that is mapped to cell C2 are inserted into the

916 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 5: Localised multicast efficient and distributed replica detection in large scale sensor networks

network and their location claims are also blocked by theblack hole B.

4.2.2 Description of the P-MPC Scheme

Like SDC, in the P-MPC scheme, a geographic hashfunction [15] is employed to map node L’s identity to thedestination cells. However, instead of mapping to a singledeterministic cell, in P-MPC, the location claim is mappedand forwarded to multiple deterministic cells with variousprobabilities.

Let C ¼ fC1; C2; . . . ; Ci; . . . ; Cvg denote the set of cells towhich an identity (denoted as IDL) is mapped. Let pcidenote the probability that the location claim of L isforwarded to cell Ci. Without loss of generality, in the restof this paper, we assume that set C is sorted by pcis. Thefollowing two conditions should be satisfied while deter-mining pcis: 1)

Pvi¼1 pci ¼ 1 and 2) pci � pcj when i < j for

i; j 2 f1; 2; . . . ; vg. The second condition is introduced toenhance the efficiency of the protocol as described later inSection 6.2. An example of P-MPC is shown in Fig. 2.

When L broadcasts its location claim, each neighborindependently decides whether to forward the claim in thesame way as the SDC scheme. Afterwards, each neighborhelping forward the claim first calculates the set of cells (i.e.,C) to which L are mapped, based on a geographic hashfunction with the input of IDL. For example, by using a one-way hash function HðÞ, node L is mapped to the set of cellsC¼fC1;C2; . . . ; Ci; . . . ;Cvg, where Ci¼½HðIDLkiÞmodða�bÞ�þ1ði 2 f1; 2; . . . ; vgÞ. Then, each neighbor that forwards theclaim independently generates a random number z 2 ½0; 1Þ.Assume that j is the smallest number that satisfiesz <

Pji¼1 pciðj 2 f1; 2; . . . ; vgÞ, this neighbor chooses the

jth cell (i.e., Cj) as the destination cell for the location claim.For example, if z ¼ 0:8 and the predetermined distribution ofpci’s is (pc1 ¼ 50%, pc2 ¼ 25%, pc3 ¼ 15%, and pc4 ¼ 10%), theclaim will be forwarded to cell C3.

Once the location claim arrives at cell Cj, the sensorreceiving it first verifies whether Cj is a member of C whichcan be calculated based on the geographic hash functionand the identity listed in the claim message. In addition, thissensor needs to verify the validity of the signature in the

location claim. If both the verifications succeed, the claim isflooded within the cell and probabilistically stored atw nodes in the same manner as in the SDC scheme.

For example, in Fig. 2, there are two replicas with thesame identity in the network. In this example, an identity ismapped to three cells (i.e., C1; C2; C3) with differentprobabilities (i.e., pc1 > pc2 > pc3). The neighbors of onereplica forward the location claims to cell C1 and C2, whilethe neighbors of the other replica forward the locationclaims to cell C1 and C3. Therefore, any witness node withcell C1 can detect the node replication.

5 ANALYSIS OF THE SINGLE DETERMINISTIC CELL

SCHEME

In this section, we analyze the security and efficiency of theSingle Deterministic Cell scheme.

5.1 Security Analysis

The metrics used to evaluate the security of the SDCscheme are:

1. the probability of detecting node replication whenadversaries put x replicas (including the compro-mised node) with the same identity into the network,which is denoted as pdr.

2. the probability that adversaries control all thewitnesses for a given identity after compromising tnodes, which is denoted as pts.

3. the probability that adversaries control all thewitnesses for at least one identity after compromis-ing t nodes, which is denoted as ptm.

The latter two metrics estimate the risk that an adversarycontrols all the witnesses for a node and can thus launch anode replication attack without being detected.

Same as [14], for the theoretical analysis in Section 5and Section 6, we assume that there are r (¼ d � pf )neighbors forwarding L’s location claim. Also, we assumethat there are w (¼ s � ps) witnesses per destination cellstoring L’s location claim. Since 1 � pf > 0 and 1 � ps > 0,we have r > 0 and w > 0.

ZHU ET AL.: LOCALIZED MULTICAST: EFFICIENT AND DISTRIBUTED REPLICA DETECTION IN LARGE-SCALE SENSOR NETWORKS 917

Fig. 2. The parallel multiple probabilistic cells approach.Fig. 1. The blocking attacks.

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 6: Localised multicast efficient and distributed replica detection in large scale sensor networks

5.1.1 Detecting Replicas

Unlike the Random Multicast and Line-Selected Multicastalgorithms [14] where the nodes storing the copies of alocation claim are chosen randomly from the wholenetwork, in SDC such nodes are chosen randomly from asmall subset of all the nodes in the network, i.e., the nodesin the destination cell determined by the geographic hashfunction. In addition, since the location claim will beflooded within the destination cell, the SDC scheme canalways detect any pair of nodes claiming the same identity.In other words, pdr ¼ 100% in SDC, when r > 0 and w > 0.

5.1.2 Resilience against Node Compromise

Assuming that the adversary’s capability of compromisingnodes is limited, in SDC the probability that an adversarycan compromise all the witness nodes storing the locationclaim of a given identity (i.e., pts) is higher than that in theRandomized Multicast algorithm, because witness nodes inthe former are chosen from a smaller set compared to thelatter. However, we argue that by appropriately choosingthe parameters, e.g., the network size (s) and probabilitythat a sensor in the cell stores the location claim (ps), we canlimit pts to a very small value, even if the adversaries cancompromise a small fraction of the nodes in cell C.

Assuming that the adversary has compromised t nodesin cell C, pts can be calculated as follows:

pts ¼s�wt�w� �

st

� � ¼ ðt� wþ 1Þðt� wþ 2Þ � � � tðs� wþ 1Þðs� wþ 2Þ � � � s ; ð1Þ

where t � w. In (1), st

� �denotes the number of all possible

combinations of compromising t sensors in a cell of size s,and s�w

t�w� �

denotes the number of all possible combinationsthat all w witnesses for a given identity are compromised,which is equivalent to the number of combinations that t� wout of t compromised sensors are chosen from s� w sensorsthat are not the witnesses for this identity.

In Fig. 3, we plot the probability that an adversary controlsall the witness nodes of a given identity (i.e., pts) underdifferent settings, when the cell size is 100 (i.e., s ¼ 100). Fig. 3shows that when w (in fact s and ps) is chosen appropriately,pts is negligible, even if the adversary can compromise a largenumber of nodes in the cell. In particular, when w ¼ 20 and

t ¼ 60, pts is only 7:82� 10�6. Even ifw is chosen as a relativesmall number, e.g., 5, the adversary still needs to compro-mise around 65 out of 100 nodes in the cell to achieve asuccess rate of nearly 11 percent.

However, in practice, the probability that the adversarycontrols all the witnesses for at least one identity (i.e., ptm)might be a more accurate and strict measure of the security ofthe scheme. In order to calculate ptm, we begin by estimatingthe probability that all the w copies of any location claim arestored within a given set T of t compromised nodes, which isdenoted as pts2. Given that the members of T and the nodesstoring any location claim are chosen randomly from all thenodes in the cell, we have

pts2 ¼tw

� �sw

� � ¼ ðt� wþ 1Þðt� wþ 2Þ � � � tðs� wþ 1Þðs� wþ 2Þ � � � s ¼ pts; ð2Þ

where t � w.In SDC, there are on average s different location claims

stored within a cell. Since the nodes storing the copies ofdifferent location claims are chosen independently, theprocess of selecting the witnesses for the s location claims,all of which are members of T , is equivalent to a Bernoullitrial in which s trials are made, with probability pts2 ofsuccess in any given trial. Let nt denote the number ofidentities for which all the copies of the correspondinglocation claims are stored within the set T . In other words,all the witness nodes for these nt identities are controlled bythe adversary. As a result, the expectation of nt and ptm canbe calculated according to (3) and (4), respectively.

EðntÞ ¼ s � pts2; ð3Þ

ptm ¼ 1� ð1� pts2Þs: ð4Þ

In Fig. 4, we plot the probability that adversaries controlall the witness nodes for at least one identity (i.e., ptm) underdifferent settings, when s ¼ 100. We notice that ptm is muchhigher than pts, especially when w is small. For example,when the average number of the witness nodes for alocation claim (i.e., w) and the number of nodes controlledby the adversary in the cell (i.e., t) are 5 and 30, respectively,ptm is 17.26 percent, which is much higher than pts, i.e.,0.19 percent. As such, it might be necessary to set w to a

918 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

Fig. 3. Probability that adversaries control all w witnesses for a givenidentity after compromising t nodes (pts).

Fig. 4. Probability that adversaries control all w witnesses for any identityafter compromising t nodes (ptm).

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 7: Localised multicast efficient and distributed replica detection in large scale sensor networks

larger number, such as 10-15, which corresponds to thesituation that, even if the adversary compromises 43-58 outof 100 nodes in a cell, the probability that she launches anode replication attack without being detected is less than1 percent, i.e., ptm < 1%.

5.2 Efficiency Analysis

The metrics used to evaluate the efficiency of the SDCscheme include:

1. The average number of packets sent and receivedwhile propagating the location claim, which isdenoted as nf .

2. The average number of copies of the location claimsstored on a sensor, which is denoted as ns.

The former is to measure the communication cost, whilethe latter is to estimate the memory overhead. We do notexplicitly consider the computation cost (i.e., verifying thatthe location claim is generated by an entity which holds theprivate key corresponding to the identity listed in theclaim), since every forwarding node needs to execute such averification and thus it is proportional to the communica-tion cost. In other words, the higher the communicationcost, the higher the computation cost.

5.2.1 Communication Cost

The communication cost of the SDC scheme has twocomponents: the cost of forwarding the location claim tothe destination cell (denoted as COfw) and the cost offlooding the location claim within the destination cell(denoted as COfl). The communication complexities of thesetwo operations are Oðd � pf �

ffiffiffinp Þ and OðsÞ, respectively.

5.2.2 Memory Overhead

SDC has the memory overhead of OðwÞ, where w ¼ s � ps. Asshown in Section 5.1, a relative small value of w, e.g.,between 10 to 15 when s ¼ 100, is sufficient to ensuresecurity against node compromise. Therefore, the memoryoverhead of the SDC scheme is significantly lower than thoseof the Random Multicast algorithm and the Line-SelectedMulticast algorithm which are of order Oð ffiffiffinp Þ or higher.2

6 ANALYSIS OF THE PARALLEL MULTIPLE

PROBABILISTIC CELLS SCHEME

In this section, we analyze the security and efficiency ofthe P-MPC scheme. In addition, a summary of thecommunication cost and memory overhead of ourapproach and the algorithms proposed in [14] is shownat the end of this section.

6.1 Security Analysis

For simplicity, in this section we assume that the number ofneighbors (r) forwarding the location claim is a fixed number.We assume that the adversary creates x� 1 replicas of agiven compromised node with id IDL and deploys them inthe network. We assume that adversaries do not repositionthe compromised node, l1, and the replicas are added insequence from l2 to lx. Let pir denote the probability that the

node replication attack is not detected by our scheme after

the ith node with the same identity has been added to the

network. For analyzing the security of the P-MPC scheme, we

use the same metrics employed in Section 5.1, except that we

replace the metric pdr with pir.

6.1.1 Detecting Replicas

Let Cs1 denote the set of all combinations of choosing 1 to

v� 1 elements from C, i.e., the set of cells to which IDL is

mapped. If the node replication attack is not detected when

the adversary adds replica l2 to the network, it implies that

the location claims for l2 have been forwarded to a set of

cells, none of which contains any node storing a location

claim from l1.Let Ce1 denote a subset of the cells in C that do not store

the location claims of l1. Let pi;1 denote the probability that

the location claim of l1 is forwarded to all the cells in C

except the cells in Ce1, which is an element of Cs1. Let pi;2denote the probability that the location claim of l2 is

forwarded to any cell(s) in Ce1. Therefore, we have:

p2r ¼XjCs1ji¼1

pi;1 � pi;2: ð5Þ

Now, we consider further the case that the adversary

adds l3 to the network. Let Cs1b denote the set of all the

combinations of choosing 2 to v� 1 elements from C. For a

given Ce1 2 Cs1b, let Cs2 denote all the combinations of

choosing 1 to jCe1j � 1 elements from Ce1. We denote Ce2 as

the set of cells that store the location claim from l2 but not l1,

and Ce2 2 Cs2. Let pi denote the probability that the location

claim of l1 is forwarded to all the cells in C except the cells

in Ce1, which is an element of Cs1b. Let pij;1 denote the

probability that the location claim of l2 is forwarded only to

all the cells in Ce2. Let pij;2 denote the probability that the

location claim of l3 is forwarded to any cell(s) in Ce1 except

those in Ce2. Thus, we have:

p3r ¼XjCs1bji¼1

XjCs2jj¼1

pi � pij;1 � pij;2: ð6Þ

Let r ¼ 3 and v ¼ 3. In Table 2, we show the estimated

success rate of detecting node replications under different

settings of pci according to (5) and (6). According to Table 2

(where “Set.” is a short notation for “Setting”), the P-MPC

scheme can achieve a very high replica detection rate, even

when an identity is mapped to three destination cells.

Moreover, we notice that the larger the differences between

the probabilities pcis, the higher is pir.

ZHU ET AL.: LOCALIZED MULTICAST: EFFICIENT AND DISTRIBUTED REPLICA DETECTION IN LARGE-SCALE SENSOR NETWORKS 919

2. Refer to Section 6.3 for the more detailed comparison.

TABLE 2Detection Rates When There Are 2 or 3 Nodeswith the Same Identity, Given Different Settings

of the Distribution of Forwarding Probabilities

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 8: Localised multicast efficient and distributed replica detection in large scale sensor networks

6.1.2 Resilience against Node Compromise

Let pSDCts ðtÞ and pP�MPCts ðtÞ denote the functions that output

the pts of the SDC scheme and the P-MPC scheme,respectively, when the number of the compromised nodesis t. Let pSDCtm ðtÞ and pP�MPC

tm ðtÞ denote the functions thatoutput the ptm of the SDC scheme and the P-MPC scheme,respectively, when the number of the compromised nodesis t. Assuming that the adversary’s capability of compro-mising nodes is bounded by t�, we have

Pvi¼1 ti ¼ t�,

where ti is the number of nodes compromised in cell Ci.Let Ct1 denote the set of all the combinations of choosing

1 to v elements from C. For any element in Ct1 denoted asCf1, the probability that the adversary controls all thewitnesses of a given identity, when such a set of cells in C(i.e., Cf1) are chosen as the destination cell(s), is the productof all the individual probabilities ptss of the cells. Let pidenote the probability that exactly the cells in Cf1 arechosen as the destination cells by the r neighbors thatforward the location claim. Let pSDCts ðtjÞ denote the pts of thejth cell of Cf1 when the number of nodes compromised inthis cell is tj. Thus, pP�MPC

ts ðtÞ can be calculated as follows:

pP�MPCts ðtÞ ¼

XjCt1ji¼1

pi �YjCf1j

j¼1

pSDCts ðtjÞ !

: ð7Þ

Note that in (7), jCt1j denotes the number of all thecombinations of choosing 1 to v elements from C, whilejCf1j denotes the number of cells contained in a chosencombination, i.e., Cf1. In additional, pSDCts ðtjÞ ¼ 1 whenthere is no witness in the jth cell of Cf1.

Let r ¼ 3 and v ¼ 3. In Table 3, we show the estimatedsuccess rate that adversaries control all the witnessesunder different compromising strategies (i.e., variousdistributions of ti) and probability distributions of thedestination cells (i.e., pci) in the P-MPC scheme, whens ¼ 100, w ¼ 5, and t� ¼ 30. The settings on ti and pci areshown in Tables 4 and 5, respectively.

From Table 3, we notice that the best strategy foradversaries is to compromise only nodes in the cell withthe highest pci, i.e., setting A of ti, rather than spreadingtheir limited capability of compromising nodes among

multiple cells in C. Assuming that the adversary selects thisoptimal strategy, the larger the differences between pcis, thelarger is pP�MPC

ts and thus the weaker the resilience of thescheme to node compromise.

Compared to SDC, P-MPC is more robust to nodecompromise. Assuming that adversaries follow the beststrategy just described, i.e., compromising only nodes in thecell with the highest pci, (7) can be converted into:

pP�MPCts ðtÞ ¼ prc1 � pSDCts ðtÞ: ð8Þ

As a result, compared to the SDC approach, the success ratethat adversaries control all the witnesses of a given identityis reduced by a factor of 1� prc1.

Unlike the SDC scheme where each identity is mappedto only one cell, in P-MPC, each identity may be mapped tomultiple cells. Since the cells for a given identity aredetermined by geographic hash functions, those cells areuniformly distributed. Therefore, on average for each cell,there are s identities choosing it with the probabilitypc1; pc2; . . . ; pcv, respectively. Assuming that instead ofspreading the limited capability of compromising nodesin multiple cells, adversaries only compromise the nodes ina given cell, we can calculate pP�MPC

tm ðtÞ via (9).

pP�MPCtm ðtÞ ¼

Xvi¼1

prci � pSDCtm ðtÞ ð9Þ

When the differences between pcis are high, e.g.,Setting I in Table 5, pP�MPC

tm ðtÞ can be approximated asprc1 � pSDCtm ðtÞ. In such cases, compared to the SDC scheme,the success rate that adversaries control all the witnessesfor at least one identity is reduced by a factor of 1� prc1as well.

In P-MPC, even if adversaries compromise all the nodesin the cell to which the location claims are forwarded withthe highest probability, i.e., pc1, node replication can still bedetected by witnesses in the other cells. For example,assuming that pc1 ¼ 80% and r ¼ 3, the replica can still bedetected with a probability of 1� p3

c1 ¼ 48:8%.

6.1.3 Denial-of-Service Attacks

Two possible Denial-of-Service (DoS) attacks against ourapproach are as follows: 1) An adversary inserts a largenumber of fake location claims into the network so as toexhaust the energy and computational resources of othernodes, who will verify the signatures included in thelocation claims according to the approach proposed. 2) Ifsome of a node L’s neighbors are controlled by theadversary, instead of choosing the destination cell basedon the probabilistic distribution and the geographic hashfunction, the adversary may forward the location claim to asmany cells as possible, leading to additional communicationoverhead when the claim is flooded within each cell.

920 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

TABLE 3Probability that the Adversary Controls All w Witnesses

for a Given Identity after Compromising t� Nodes in a Cellof Size s in the P-MPC Scheme (s ¼ 100, w ¼ 5, t� ¼ 30)

TABLE 4Settings on the Distribution of # of Compromised Nodes

TABLE 5Settings of the Distribution of Forwarding Probabilities

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 9: Localised multicast efficient and distributed replica detection in large scale sensor networks

For the first attack, in both SDC and MPC, any fake

location claim would fail the verification process, and thus,will not be forwarded further. As to the latter, which is onlyapplicable to MPC, if the destination cell chosen is not anelement of C (i.e., the set of cells to which the given identity

is mapped) or a neighbor forwards the same location claimto more than one cells, the attack would be detected byother neighbors of L, although it requires the neighbors tolisten promiscuously. To avoid detection based on signa-

ture verifications, the best strategy for this type of DoSattack is to ignore the probabilistic distribution being usedby P-MPC for selecting destination cells, and let differentneighbors choose different destination cells in C. However,as shown by our analysis, a small number of cells (v ¼ 3) is

sufficient for P-MPC to provide a high level of resilienceagainst node compromise while ensuring a very highdetection rate on node replication. Therefore, the effective-ness of this attack is limited.

6.2 Efficiency Analysis

When analyzing the efficiency of the P-MPC scheme, wefollow the same metrics employed in Section 5.2.

6.2.1 Communication Cost

Similar to the SDC scheme, the communication cost for

P-MPC has two components: the cost for propagating thelocation claim to the cells chosen and the cost for floodingthe claim within these cells, denoted as COfw and COfl,respectively.

Assuming that in the P-MPC scheme there are on

average r neighbors forwarding a location claim, thecommunication complexity of COfw is Oðr �

ffiffiffinpÞ in P-MPC,

if we assume that the neighbors of L forward the locationclaim independently and do not consider further optimi-zations, e.g., a node only forwards the location claims with

the same identity and location information once within acertain time interval.

The communication complexity of COfl in the P-MPCscheme can be estimated as follows: Since there arer neighbors of L forwarding the location claim, the

probability that any cell in C (i.e., Ci) is chosen by at leastone out of r neighbors is:

psi ¼ 1� ð1� pciÞr:

Therefore, the complexity of COfl in the P-MPC scheme canbe described as Oðs �

Pvi¼1 psiÞ. Table 6 shows the value ofPv

i¼1 psi in terms of different settings on pcis when v ¼ 3.According to Table 6, the larger the differences between

pcis, the smaller the extra overhead of flooding the locationclaim, when compared to the SDC scheme.

6.2.2 Memory Overhead

In a similar fashion, we can see that the the memoryoverhead of the P-MPC scheme is given by s � ps �

Pvi¼1 psi.

6.3 Summary

Before presenting empirical results in Section 7, in Table 7,we summarize the average communication cost andmemory overhead per node of the two variants of theLocalized Multicast approach, together with the two multi-cast algorithms proposed in [14] (i.e., Randomized Multi-cast and Line-Selected Multicast) and the RED protocol [2].

In Table 7, we denote the density of the network (i.e., theaverage number of neighbors per node), the probability thata neighbor of node L decides to forward L’s location claim,and the number of the witness nodes storing the locationclaim for a given identity in our approach as d, pf , and w,respectively. In addition, let g denote the number ofdestinations (i.e., witnesses) to which a neighbor forwardsthe location claim, if it decides to help, in the RandomizedMulticast and Line-Selected Multicast algorithms and theRED protocol. Apparently, the Random Multicast algorithmhas a huge communication and memory overhead, and thus,in the following we only compare our approaches with theLine-Selected Multicast algorithm and the RED protocol.

6.3.1 Comparison with the Line-Selected Multicast

Algorithm

According to the analysis in Sections 5 and 6, we know thatr can be set to a small value, e.g., 3, while still ensuringhigher success rate of detecting replicas. To maintain arelative high detection rate, the typical setting of g � pf � d inthe Line-Selected Multicast algorithm is 6. Therefore, theCOfw of either SDC or P-MPC is smaller than thecorresponding communication cost of the Line-SelectedMulticast algorithm. However, our approach has the extraoverhead of flooding the location claim within one or morecells, i.e., COfl.

Note that for both SDC and P-MPC, the lower bound ofthe cell size is determined by the security requirements.Once the cell size and the flooding algorithm within the cellare chosen, COfl is fixed and independent of the networksize. According to Table 7, we know that the extraoverheads of the Random Multicast and the Line-SelectedMulticast algorithms over COfw of our approach can bedescribed as ð ffiffiffinp � rÞ � S and ðg � pf � d� rÞ � S, respectively,where S denotes the average communication cost offorwarding a packet between a randomly chosen pair ofnodes in the network. S is tightly related to the network size(i.e., the complexity of S is Oð

ffiffiffinpÞ) and the network

ZHU ET AL.: LOCALIZED MULTICAST: EFFICIENT AND DISTRIBUTED REPLICA DETECTION IN LARGE-SCALE SENSOR NETWORKS 921

TABLE 6Pvi¼1 psi in Terms of Different Settings on pcis (v ¼ 3)

TABLE 7Comparisons of Average Communication

Cost and Memory Overhead

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 10: Localised multicast efficient and distributed replica detection in large scale sensor networks

topology (i.e., for the same network size, S under anirregular topology is higher than that under a regularuniform topology). Consequently, our schemes are morescalable and less sensitive to irregular topologies, whencompared to the two algorithms proposed in [14].

The analysis in Sections 5.1 and 6.1 shows that it issufficient to choose a small value for w to resist nodecompromise, and thus, our approaches provide far bettermemory efficiency, compared to the Randomized Multicastand Line-Selected Multicast algorithms, especially when thenetwork size is very large.

6.3.2 Comparison with the RED Protocol

In Table 7, the complexity of the communication overheadof the RED protocol is the same as that of the Line-SelectedMulticast algorithm. However, theoretically, it is fine to setg and pf in such a way that g � pf � d is smaller than thetypical setting in the Line-Selected Multicast algorithm,since as long as there is at least one neighbor forwards thelocation claim and assume that there is no communicationloss, the RED protocol can detect the replicas. In this sense,the communication overhead of RED is the same as COfw inSDC, if pf � d ¼ r and g ¼ 1, but without the extra overheadof flooding with one or a few cells. Nevertheless, in practice,due to the communication loss and the routing errors, weshould set g � pf � d to a higher value to ensure a certain levelof detection rate. For example, due to this reason, in oursimulation when pf ¼ 3=d the actual detection rate of SDC isslightly lower than that of P-MPC. Consequently, sinceCOfl is fixed and independent of the network size, thecommunication overhead of SDC and P-MPC will onlyslightly higher than that of RED, in particular when thenetwork size is large.

The memory overheads of SDC and RED are w andg � pf � d, respectively. Both of them are small numbers, e.g., 2to 5, and thus, the memory overhead of these two algorithmsare comparable. The memory overhead of P-MPC is slightlyhigher than that of SDC or RED.

7 EVALUATION

We evaluated the performance and security of our schemesand those proposed by Parno et al. via extensive simula-tions. To enable a fair comparison, we used the samesimulation methodology and simulation code that was usedin Parno et al.’s study [14]. In addition, we also investigatedsecurity and efficiency of our approach under differentsettings, such as different probabilities of forwardinglocation claims.

7.1 Metrics

We used the following metrics to compare the schemes:

. Communication overhead: We measured the totalnumber of packets sent and received per node forrunning the replica detection algorithm when nnodes are added to the network. We denote thismetric as nf .

. Success rate in detecting replicas: We measured theprobability of detecting a replica, when there are twosensors with the same identity in the network, i.e., p2r.

7.2 System and Network Models

As in the Parno et al.’s study, we considered both uniformand irregular network topologies. In the uniform topology,nodes are randomly distributed within a 500� 500 square.The network size (n) varies between 1,000 to 10,000. Weassume a bidirectional communication model, and adjustthe transmission range so that the average number ofneighbors of a sensor (d) is 40. We also considered sixirregular topologies (as shown in Fig. 5), i.e., “Thin H,”“Thin Cross,” “Thin 2,” “Large Cross,” “L,” and “Large H”with the same density, i.e., d ¼ 40. As in Parno et al.’s study[14], these topologies are generated as the subregions of theregular topology (n ¼ 10;000).

The Localized Multicast approach assumes that thenetwork is divided into cells, and that a location claim isflooded within the destination cell(s). There has beenextensive work in optimal/efficient flooding [10], [9], [8],[16], and our approaches can be easily integrated with anyefficient flooding algorithm. In all our simulations, we usedthe following simple algorithm, unless specified otherwise.Let Ncell ¼ k2 denote the number of cells in the network. Letl denote the length of the side of the network. The size of acell is selected in such a way that one broadcast can covermost of the area of a cell. Thus, we have

k ¼ round lffiffiffi2p�R

� �¼ round lffiffiffi

2p�ffiffiffiffiffiffid�l2��n

q0B@

1CA ¼ round

ffiffiffiffiffiffi�n

2d

r� �;

where R is the communication range of a node and roundðÞis a function that rounds the input to the nearest integer. Fornodes not covered by the broadcast, further unicasts arerequired to deliver the location claim.

For the Random Multicast and Line-Selected Multicastalgorithms, we use the same settings as in [14], except for pfin the Line-Selected Multicast algorithm. More specifically,for the former, we set the number of sensors storing a givenlocation claim to

ffiffiffinp

, i.e., w ¼ ffiffiffinp

; for the latter, we set thenumber of lines as 6 (i.e., f ¼ 6) in the comparison. As to pfin the Line-Selected Multicast algorithm, it is set to 1=d in[14], and each forwarding node randomly picks f destina-tions. In our simulation, we set pf ¼ f=d, and eachforwarding node randomly picks only one destination.Given the same density, our setting on f results in a lowerprobability that there is no neighbor forwarding the locationclaim. As a result, compared to [14], in our simulation theLine-Selected Multicast algorithm has a higher success rateof detecting node replication, as shown in Section 7.3.2.

For both SDC and P-MPC, we set pf ¼ 3=d and ps ¼ 0:2.Besides that, for P-MPC, we use Setting I in Table 5 as thesetting of pcis in the simulation. Namely, v ¼ 3, and pc1, pc2,and pc3 are 80 percent, 15 percent, and 5 percent, respectively.

7.3 Comparisons with Parno et al.’s Work

7.3.1 Communication Overhead

The figures below show the 95 percent confidence intervalsof the reported metric. In Fig. 6, we compare the commu-nication costs of our two schemes with the two algorithmsproposed in [14] for uniform topologies. As shown in Fig. 6,the Random Multicast algorithm has the highest commu-nication costs under all the settings. Among the remainingschemes, SDC has the lowest communication overhead,

922 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 11: Localised multicast efficient and distributed replica detection in large scale sensor networks

though the differences between SDC, P-MPC, and Line-

Selected Multicast are relatively small. As the network sizeincreases, P-MPC and SDC have lower overhead than Line-

Selected Multicast. Fig. 6 shows that SDC and P-MPC have

lower communication overheads than Line-Selected Multi-

cast when n � 2;000 and n � 4;000, respectively.In Fig. 7, we compare the communication costs of our two

schemes with the two algorithms proposed in [14] for

irregular topologies. In comparison to Line-Selected Multi-

cast, both SDC and P-MPC show much stronger adaptability

for irregular network topologies. Under all the irregular

topologies, the nfs of our two schemes are smaller than that

of the Line-Selected Multicast algorithm. In particular,

under the “Thin H”, “Thin Cross”, “Thin 2”, and “LargeH” topologies, the advantage of our two schemes over the

Line-Selected Multicast algorithm is much higher than thatunder the regular topology (n ¼ 10;000). More specifically,under these four topologies, SDC’s and P-MPC’s advantageover the Line-Selected Multicast algorithm (in terms of thecommunication cost) is 149 percent to 181 percent and238 percent to 296 percent, respectively, higher than thatunder the regular topology (n ¼ 10;000).

7.3.2 Replica Detection Success Rate

Due to the high cost of the Random Multicast algorithm, weonly consider SDC, P-MPC, and the Line-Selected Multicastalgorithm while comparing the success rates of detectingnode replication.

Fig. 8 shows that, compared to the Line-SelectedMulticast algorithm, both of our algorithms have muchhigher success rates of detecting node replication. More

ZHU ET AL.: LOCALIZED MULTICAST: EFFICIENT AND DISTRIBUTED REPLICA DETECTION IN LARGE-SCALE SENSOR NETWORKS 923

Fig. 6. Communication overhead of SDC, P-MPC, Random Multicast,and Line-Selected Multicast for uniform topologies.

Fig. 7. Communication overhead of SDC, P-MPC, and Line-SelectedMulticast for irregular network topologies.

Fig. 5. Irregular topologies: (a) Thin H, (b) Thin Cross, (c) Thin 2, (d) Large Cross, (e) L, and (f) Large H.

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 12: Localised multicast efficient and distributed replica detection in large scale sensor networks

specifically, on average, the success rates of SDC andP-MPC in detecting node replication are 25.64 percent and21.77 percent higher than that of the Line-Selected Multi-cast algorithm, respectively.

We notice that, however, the success rates of SDC rangefrom 89.4 percent to 94.5 percent, which are lower than theexpected value (i.e., 100 percent) according to the theoreticalanalysis in Section 5.1. It is due to two reasons. The mainreason is that, each neighbor decides whether to forwardthe location claim independently, and thus, there exists aprobability that no neighbor forwards the location claim. Asa result, SDC fails to detect a node replication attack, if forany of the two replicas no neighbor forwards its locationclaim. In addition, when ps is too small, there exists aprobability that no node within cell C stores the locationclaim, which may also result in SDC’s failure in detectingnode replication. In the simulation, we set ps ¼ 0:2, andthus, the second reason only has a negligible effect.

Due to the same reason, the simulation results aboutP-MPC’s success rates of detecting node replication arelower than the expected value according to the theoreticalanalysis in Section 6.1.

In the Line-Selected Multicast algorithm, the probabilityof detecting replicas can be improved by increase thenumber of lines involved in forwarding a location claim,i.e., f . Fig. 9 shows that, a large value of f leads to a higherp2r. For example, when f increases from 6 to 8, the

probability of detecting replicas increases by on average

13.62 percent. However, as a trade-off, the communication

cost increases by on average 28.22 percent at the same time,

as shown in Fig. 10.

7.4 Evaluation of Our Approach under DifferentSettings

In the following, we present the results of simulations that

aim at evaluating both security and efficiency of our

approach under different settings. According to the

collected results, when a parameter is changed, SDC and

P-MPC share the same trend, i.e., increase, decrease, or

unchange. Due to the limit of space, we only present the

results related to SDC.

7.4.1 Different Settings on the Probability of Forwarding

Location Claims

Similar to the impact of f on the Line-Selected Multicast

algorithm, as shown in Figs. 11 and 12, in SDC a higher

probability of forwarding location claims (i.e., pf ) can

improve the probability of detecting replicas, while at the

same time the communication overhead is raised.

7.4.2 Different Settings on the Probability of Storing

Location Claims

Intuitively, when the probability of storing location claims(i.e., ps) increases, there are less chances that no sensor in a

924 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

Fig. 8. Success rate of detecting replicas in SDC, P-MPC, and Line-Selected Multicast.

Fig. 9. Success rate of detecting replicas in Line-Selected Multicast withdifferent numbers of lines.

Fig. 10. Communication overhead of Line-Selected Multicast withdifferent numbers of lines.

Fig. 11. Success rate of detecting replicas in SDC with different settingson pf .

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 13: Localised multicast efficient and distributed replica detection in large scale sensor networks

cell stores a location claim forwarded to this cell. As aresult, as shown in Fig. 13 the detection rate is improved atthe cost of a higher memory overhead. As to the commu-nication overhead, it is unchanged, since the same floodingoverhead applies in spite of the value of ps, as long as thelocation claim arrives the mapped cell.

7.4.3 Different Settings on the Cell Size

To evaluate the influence of the cell size on our approach, wetested three settings on the cell size, i.e., s1, 2s1, and 4s1, wheres1 is the default cell size when the network is partitioned intocells through the method described in Section 7.2.

Apparently, given the same probability of storinglocation claims, the larger the cell size, the larger theaverage number of witnesses per location claim, and thus,the less the probability that there is no witness for a locationclaim flooding the cell, which results in a higher detectionrate. This observation is confirmed by our results. Fig. 14shows that p2r increases when raising the cell size.

In terms of communication overhead as shown in Fig. 15,although the cell size is increased by 100 percent and300 percent, respectively, the overall communication costper node increases by only 12.2 percent and 24.6 percent,respectively. It is mainly due to the fact that, the floodingcost is a relatively small portion of the overall communica-tion cost. Moreover, when the cell size increases, theaverage number of hops between a neighbor forwardingthe location claim and the mapped cell will decrease. Inother words, the forwarding cost is reduced.

8 CONCLUSION AND FUTURE WORK

In this paper, we proposed two variants of the LocalizedMulticast approach for distributed detection of nodereplication attacks in wireless sensor networks. Unlike thetwo randomized algorithms proposed by Parno et al. [14],our approach combines deterministic mapping (to reducecommunication and storage costs) with randomization (toincrease the level of resilience to node compromise). Ourtheoretical analysis and empirical results show that,compared to Parno et al.’s algorithms, our schemes aremore efficient in large-scale sensor networks, in terms ofcommunication and memory costs. Moreover, the prob-ability of replica detection in our approach is higher thanthat achieved in these two algorithms.

Our preliminary analysis also shows that, our ap-proaches are more robust than RED against selective nodecompromise, and the communication and memory over-heads of our approaches are similar or slightly higher thanthat of RED. One of our future work is to simulate the REDprotocol and then have a more detailed comparison ofefficiency based on empirical results.

ACKNOWLEDGMENTS

A preliminary version of this article appeared in the 2007Proceedings of the Annual Computer Security ApplicationsConference. This work is based on previous work at Centerfor Secure Information Systems, George Mason University.

ZHU ET AL.: LOCALIZED MULTICAST: EFFICIENT AND DISTRIBUTED REPLICA DETECTION IN LARGE-SCALE SENSOR NETWORKS 925

Fig. 13. Success rate of detecting replicas in SDC with different settingson ps.

Fig. 14. Success rate of detecting replicas in SDC with different cellsizes.

Fig. 15. Communication overhead of SDC with different cell sizes.

Fig. 12. Communication overhead of SDC with different settings on pf .

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.

Page 14: Localised multicast efficient and distributed replica detection in large scale sensor networks

REFERENCES

[1] H. Choi, S. Zhu, and T.F. La Porta, “SET: Detecting Node Clonesin Sensor Networks,” Proc. Third Int’l Conf. Security and Privacy inComm. Networks (SecureComm) 2007.

[2] M. Conti, R. Di Pietro, L.V. Mancini, and A. Mei, “A Randomized,Efficient, and Distributed Protocol for the Detection of NodeReplication Attacks in Wireless Sensor Networks,” Proc. ACMMobiHoc, pp. 80-89, 2007.

[3] J.R. Douceur, “The Sybil Attack,” Proc. First Int’l Workshop Peer-to-Peer Systems (IPTPS ’02), pp. 251-260, 2002.

[4] L. Eschenauer and V.D. Gligor, “A Key-Management Scheme forDistributed Sensor Networks,” Proc. Ninth ACM Conf. Computerand Comm. Security, pp. 41-47, 2002.

[5] G. Gaubatz, J.-P. Kaps, and B. Sunar, “Public Key Cryptography inSensor Networks-Revisited,” Proc. First European Workshop Secur-ity in Ad-Hoc and Sensor Networks (ESAS ’04), pp. 2-18, 2004.

[6] F. Hess, “Efficient Identity Based Signature Schemes Based onPairings,” Proc. Ninth Ann. Int’l Workshop Selected Areas inCryptography (SAC ’02), pp. 310-324, 2002.

[7] B. Karp and H.T. Kung, “GPSR: Greedy Perimeter StatelessRouting for Wireless Networks,” Proc. ACM MobiCom, pp. 243-254, 2000.

[8] Y.-B. Ko, J.-M. Choi, and J.-H. Kim, “A New Directional FloodingProtocol for Wireless Sensor Networks,” Proc. Int’l Conf. Informa-tion Networking (ICOIN ’04), pp. 93-102, 2004.

[9] T.J. Kwon and M. Gerla, “Efficient Flooding with PassiveClustering (PC) in Ad Hoc Networks,” ACM SIGCOMM ComputerComm. Rev., vol. 32, no. 1, pp. 44-56, 2002.

[10] H. Lim and C. Kim, “Flooding in Wireless Ad Hoc Networks,”Computer Comm., vol. 24, nos. 3/4, pp. 353-363, 2000.

[11] D.J. Malan, M. Welsh, and M.D. Smith, “A Public-Key Infra-structure for Key Distribution in TinyOS Based on Elliptic CurveCryptography,” Proc. IEEE Conf. Sensor and Ad Hoc Comm. andNetworks (SECON), pp. 71-80, 2004.

[12] S. Marti, T.J. Giuli, K. Lai, and M. Baker, “Mitigating RoutingMisbehavior in Mobile Ad Hoc Networks,” Proc. ACM MobiCom,pp. 255-265, 2000.

[13] J. Newsome, E. Shi, D. Song, and A. Perrig, “The Sybil Attack inSensor Networks: Analysis & Defenses,” Proc. Third Int’l Symp.Information Processing in Sensor Networks (IPSN ’04), pp. 259-268,2004.

[14] B. Parno, A. Perrig, and V. Gligor, “Distributed Detection of NodeReplication Attacks in Sensor Networks,” Proc. IEEE Symp.Security and Privacy (S&P ’05), pp. 49-63, 2005.

[15] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S.Shenker, “GHT: A Geographic Hash Table for Data-CentricStorage,” Proc. First ACM Int’l Workshop Wireless Sensor Networksand Applications (WSNA), pp. 78-87, 2002.

[16] H. Sabbineni and K. Chakrabarty, “Location-Aided Flooding: AnEnergy-Efficient Data Dissemination Protocol for Wireless SensorNetworks,” IEEE Trans. Computers, vol. 54, no. 1, pp. 36-46, Jan.2005.

[17] A. Seshadri, A. Perrig, L.V. Doorn, and P. Khosla, “SWATT:SoftWare-Based ATTestation for Embedded Devices,” Proc. IEEESymp. Security and Privacy (S&P ’04), pp. 272-282, 2004.

[18] B. Zhu, V.G.K. Addada, S. Setia, S. Jajodia, and S. Roy, “EfficientDistributed Detection of Node Replication Attacks in SensorNetworks,” Proc. 23rd Ann. Computer Security Applications Conf.(ACSAC ’07), 2007.

Bo Zhu received the BEng and MEng degreesfrom Wuhan University in 1996 and 1999,respectively, and the MSc and PhD degreesfrom the National University of Singapore in 2002and 2006, respectively. He joined the ConcordiaInstitute for Information Systems Engineering asan assistant professor in 2007. Before joiningConcordia University, he was a postdoctoralresearcher in the Center for Secure InformationSystems at George Mason University for two

years. His research interests include security and privacy issues inwireless networks, intrusion detection systems and malware detection,data security and privacy, Internet and peer-to-peer networks security,and applied cryptography. He is a member of the IEEE.

Sanjeev Setia received the PhD degree fromthe University of Maryland, College Park, in1993. He is a professor of computer science atGeorge Mason University. His research interestsinclude ad hoc and sensor networks, networksecurity, and performance evaluation of compu-ter systems. In recent years, he has workedextensively on security mechanisms and proto-cols for ad hoc and wireless sensor networks.He was a cofounder of the ACM Workshop on

Security in Ad Hoc and Sensor Networks (SASN) and served as itscoorganizer in 2003 and 2004. His research has been funded by the USNational Science Foundation, NASA, and DARPA.

Sushil Jajodia received the PhD degree fromthe University of Oregon, Eugene. He is auniversity professor, BDM international profes-sor of information technology, and the director ofthe Center for Secure Information Systems atGeorge Mason University, Fairfax, Virginia. Hejoined George Mason after serving as thedirector of the Database and Expert SystemsProgram at the US National Science Foundation.Before that, he was the head of the Database

and Distributed Systems Section at the US Naval Research Laboratory,Washington, and associate professor of computer science and directorof graduate studies at the University of Missouri, Columbia. He was alsoa visiting professor at the University of Milan, the University of Rome “LaSapienza,” Italy, and at the Isaac Newton Institute for MathematicalSciences, Cambridge University, England. The scope of his researchinterests encompasses information secrecy, privacy, integrity, andavailability problems in military, civil, and commercial sectors. He hasauthored six books, edited 34 books and conference proceedings, andpublished more than 350 technical papers in the refereed journals andconference proceedings. He is also the holder of two patents and hasseveral patent applications pending. He received the 1996 KristianBeckman award from IFIP TC 11 for his contributions to the discipline ofinformation security, the 2000 Outstanding Research Faculty Awardfrom Mason’s Volgenau School of Information Technology and En-gineering, and the 2008 ACM SIGSAC Outstanding Contributions Awardfor his research and teaching contributions to the information securityfield and his service to ACM SIGSAC and the computing community. Hehas served in different capacities for various journals and conferences.He is the founding editor-in-chief of the Journal of Computer Security andon the editorial boards of IET Information Security, the InternationalJournal of Cooperative Information Systems, the International Journal ofInformation and Computer Security, and the International Journal ofInformation Security and Privacy. He is the consulting editor of theSpringer International Series on Advances in Information Security. He isa senior member of the IEEE. More details about the author can beobtained at http://csis.gmu.edu/jajodia.

Sankardas Roy received the master of technol-ogy degree in computer science from the IndianStatistical Institute, Kolkata, India, in 2001, andthe PhD degree in information technology fromGeorge Mason University in 2008. Currently, heis a postdoctoral researcher at the University ofMemphis. His research interests include sensornetwork security, ad hoc network security, andnetwork security in general. He is a member ofthe IEEE.

Lingyu Wang received the BE degree fromShen Yang Institute of Aeronautic Engineering inChina, the ME degree from Shanghai Jiao TongUniversity, and the PhD degree in informationtechnology from George Mason University. He isan assistant professor at the Concordia Institutefor Information Systems Engineering at Concor-dia University, Montreal, Quebec, Canada. Hisresearch interests include database security,data privacy, vulnerability analysis, intrusion

detection, and security metrics, He is a member of the IEEE.

926 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 7, JULY 2010

Authorized licensed use limited to: Asha Das. Downloaded on July 29,2010 at 11:55:49 UTC from IEEE Xplore. Restrictions apply.