Top k query processing and malicious node identification based on node grouping in manets

Received February 18, 2016, accepted March 10, 2016, date of publication March 14, 2016, date of current version March 23, 2016.

Digital Object Identifier 10.1109/ACCESS.2016.2541864

Top-k Query Processing and Malicious NodeIdentification Based on NodeGrouping in MANETsTAKUJI TSUDA, YUKA KOMAI, TAKAHIRO HARA, (Senior Member, IEEE),AND SHOJIRO NISHIO, (Fellow, IEEE)Department of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan

Corresponding author: Y. Komai ([email protected])

This work was supported by the Grant-in-Aid for Challenging Exploratory Research within the Ministry of Education, Culture,Sports, Science and Technology, Japan, under Grant 26540035.

ABSTRACT In mobile ad hoc networks (MANETs), it is effective to retrieve data items using top-k query.However, accurate results may not be acquired in environments when malicious nodes are present. In thispaper, we assume that malicious nodes attempt to replace necessary data items with unnecessary ones(we call these data replacement attacks), and propose methods for top-k query processing and maliciousnode identification based on node grouping in MANETs. In order to maintain the accuracy of the queryresult, nodes reply with k data items with the highest score along multiple routes, and the query-issuingnode tries to detect attacks from the information attached to the reply messages. After detecting attacks, thequery-issuing node tries to identify the malicious nodes through message exchanges with other nodes. Whenmultiple malicious nodes are present, the query-issuing node may not be able to identify all malicious nodesat a single query. It is effective for a node to share information about the identifiedmalicious nodes with othernodes. In our method, each node divides all nodes into groups by using the similarity of the information aboutthe identified malicious nodes. Then, it identifies malicious nodes based on the information on the groups.We conduct simulation experiments by using a network simulator, QualNet5.2, to verify that our methodachieves high accuracy of the query result and identifies malicious nodes.

INDEX TERMS Ad hoc networks, top-k query processing, data replacement attack, grouping.

I. INTRODUCTIONRecently, there has been an increasing interest in mobilead hoc network (MANET), which is constructed by onlymobile nodes. Since such self-distributed networks do notrequire pre-existing base stations, they are expected to applyto various situations such as military affairs and rescue workin disaster sites. In MANETs, since each node has poorresources (i.e., the communication bandwidth and the batterylife of mobile nodes are limited), it is effective to retrieveonly the necessary data items using top-k query, in whichdata items are ordered according to a particular attributescore, and the query-issuing node acquires the data itemswith k highest scores in the network (the global top-k result).On the other hand, in MANETs, if a normal node becomesmalicious owing to an attack from outside the network, themalicious node tries to disrupt the operations of the system.In this case, the user whose network contains the malicious

node will typically continue to operate the system normally,unaware of the threat, while the malicious node may executea variety of attacks (e.g. Denial of Service (DoS) attack [28]such as blackhole attack).

Let us consider a purpose of malicious node attackingtop-k query processing. Basically, malicious nodes attemptto disrupt query-issuing node’s acquisition of the globaltop-k result for a long period, without being detected.However, DoS attacks inMANETs have been actively studiedfor long years, and as a result, using existing techniques,such attacks can be exposed by the query-issuing nodeor intermediate nodes. Here, a remarkable characteristic oftop-k query processing is that the query-issuing node does notknow the global top-k result beforehand. Therefore, even ifa malicious node replaces high-score data items with its ownlow-score ones, when relaying the data items, it is difficult forthe query-issuing to detect the attack, and it may believe that

VOLUME 4, 20162169-3536 2016 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

993

www.redpel.com +917620593389


T. Tsuda et al.: Top-k Query Processing and Malicious Node Identification Based on Node Grouping in MANETs

all the received data items with k highest scores are the globaltop-k result. In this paper, we define a new type of attackcalled data replacement attack (DRA), in which a maliciousnode replaces the received data items (which we call the localtop-k result) with unnecessary yet proper data items (e.g., itsown low-score data items). Since DRAs are a strong attack,and more difficult to detect than other traditional types ofattack, some specificmechanism for defending against DRAsare required.

FIGURE 1. Example of a top-k query in a MANET.

Fig. 1 shows an example of performing a top-k query in aMANETs, where a rescue worker in a disaster site acquiresdata items with 2 highest scores (e.g., biological informationabout victims). Let us assume that themobile node held by therescue worker atP3 becomes amalicious node, and it replacesthe received highest score data item whose score is 94, withits own lower-score data item whose score is 84. Therefore,the node held by the rescue worker at P1, who issues a top-kquery, cannot acquire the data item whose score is 94, and itcannot know the node at P3 performed a DRA.In this paper, we propose top-k query processing and mali-

cious node identification methods again DRAs in MANETs.In the top-k query processing method, in order to maintainaccuracy of query result and detect attacks, nodes replywith data items with k highest scores along multiple routes.Moreover, to enable detection of DRA, reply messagesinclude information on the route along which reply messagesare forwarded, and thus the query-issuing node can know thedata items that properly belong to the message. In the mali-cious node identification method, the query-issuing node firstnarrows down the malicious node candidates, using informa-tion in the received message, and then requests informationon the data items sent by these candidates. In this way, thequery-issuing node can identify the malicious node.

When there are multiple malicious nodes in the network,it is difficult to identify all the malicious nodes in a singlequery. By using our methods, nodes are likely to identifythe malicious nodes which are near their own location, whilethey hardly identify the malicious nodes which are far fromtheir own location. Therefore, in order to quickly identify

more malicious nodes, it is effective to share the informationabout the identified malicious nodes with other nodes. In thiscase, however, a malicious node may declare fake informa-tion that claims normal nodes as the malicious nodes (falsenotification attack (FNA)).We need somemethod to correctlyidentify the malicious nodes against FNAs.

Therefore, in our malicious node identification method,after nodes share the malicious node identification informa-tion, each node divides all nodes into some groups based onthe similarity of the information. Then, the node determinesthe final judgement of malicious nodes based on the judgmentresult of each group. In our method, even if malicious nodesclaim that normal nodes are the malicious nodes, there is adecisive difference in the nature of the information possessedby normal and malicious nodes concerning the identifiedmalicious nodes, and therefore, the normal nodes can easilyidentify the malicious nodes. Furthermore, even if maliciousnodes mix the correct information on malicious nodes iden-tified by other normal nodes with their fake information,in order to increase their similarity with normal nodes, thenormal nodes in the same group will nonetheless certainlyidentify the malicious nodes, but not normal nodes. Thus, theinformation from the malicious nodes can be removed andthere is little influence of FNAs.

Our contributions are as follows:• We describe a new attack model, DRA, in which amalicious node replaces necessary data items withunnecessary ones, and we analyze the effects of such anattack on top-k query processing when there are multiplemalicious nodes in the networks.

• We propose methods for processing top-k queries andfor identifying malicious nodes against a DRA inMANETs.

• We describe an attack model, FNA, in which a maliciousnode sends fake information that claims some normalnodes as malicious nodes, and we evaluate the effects ofsuch an attack.

• We verify that our proposed methods can achieve highaccuracy of the query result and identify maliciousnodes, through extensive simulations that take intoaccount physical layer effects in the networks.

The remainder of the paper is organized as follows:In Section II, we review related work. In Section III,we describe our operating assumptions. In Section IV, wepresent our proposed methods for top-k query processing andidentification of malicious nodes in MANETs. In Section V,we discuss our methods including typical situations whenthe query-issuing node cannot identify malicious nodes andthe influence on FNAs. In Section VI, we discuss theresults of the simulation experiments. Finally, in Section VII,we summarize the paper.

Note that some of the results of this paper have beenreported in [23]. In this paper, we have mainly extended themalicious node identification method, assuming a situationwhen there are multiple malicious nodes in the networks.More specifically, the extendedmethod shares information on

994 VOLUME 4, 2016




the detected malicious nodes among other nodes, and quicklydetect more malicious nodes.

II. RELATED WORKIn this section, we review existing studies on secure routing,top-k query processing methods, and reputation systems.

A. SECURE ROUTING METHODSIn the field of MANET, secure routing protocols protectagainst falsification of data and DoS attacks [28] have beenwell studied. Secure routing protocols commonly employdata transmission alongmultiple routes (from the source nodeto the destination node) [11], [15], [16], [19], [34], and dataencryption using symmetric or public keys [6], [9], [12].In [11], the authors have proposed a method in which thesource node determines multiple safe routes (from the sourcenode to the destination node) by encrypting the route requestmessage using a hash function before sending data items.However, the methods neither assume top-k queries norprotect against DRA, and thus cannot be directly appliedto the problem addressed in this paper. In [6], the authorshave proposed a method in which each sensor node sendsdata items with message authentication code (MAC), whichare encrypted by using a symmetric key. When each nodereceives the message, it confirms the validity of the messageby checking whether the received MAC is same as the MACwhich is calculated from the received data items encrypted bythe symmetric key. However, even if data items are encrypted,DRAs described above cannot be avoided by these methods,because malicious nodes merely replace received data itemswith data items of their own.

B. TOP-k QUERY PROCESSING METHODSIn the field of database systems and distributed systems, top-kquery is effective to retrieve only the required data items in alarge amount of data items. In [2], [5], [18], [20], and [29],the authors have proposed methods to reduce energy con-sumption and traffic in unstructured P2P networks or wirelesssensor networks, by enabling nodes to filter unnecessary dataitems. However, these methods do not protect against DRA,and are unsuitable for use in MANETs, because they are notadapted to node mobility.

In [1], [10], [25], and [26], we proposed top-k query pro-cessing methods for MANETs, adapted to the node mobility,maintaining high accuracy of the top-k result and reducingtraffic. In [10] and [25], we proposed methods in whichquery messages include the scores of data items, and nodesnarrow down candidates that may include the global top-kresult, resulting in reduced communication traffic volume.In [1] and [26], we proposed methods in which the query-issuing node first retrieves the k-highest score (threshold) inthe network, and then acquires data items with scores equal toor greater than the threshold. However, these methods are notdesigned for environments in which malicious nodes exist,for example, the data items in the top-k result are sent backalong a single route, and thus are vulnerable to DRA.

In [27] and [30]–[33], the authors have proposed securetop-k query processing methods in the environment wherethere are some malicious nodes in a network. In [27], theauthors have proposed a method in which each sensor nodesends each data item attached both the hash value of onepriority data item and that of one superior data item. Afterthe source node received the top-k result, it ensures the safetyof the received data items to check whether the received hashvalues correspond with hash values calculated by the receiveddata items. In these methods, the sender node protects againstfabrication of data items by sending data items encryptingwith a symmetric key. However, these methods cannot han-dle DRAs. Especially, in [32], the authors have proposed amethod against false data injection attacks, where maliciousnodes generate new and false data items (i.e., other nodes’data items or data items whose score are not same as thescore calculated from raw data items and query conditions)and send back them. However, we assume that raw dataitems are generated from some special devices and softwaressuch as medical sensors, which can be read but cannot bemodified even by the owner nodes. Therefore, we assumethat malicious nodes perform DRAs, which malicious nodesreplaces necessary data items with unnecessary yet properdata items.

C. REPUTATION SYSTEMSIn the distributed systems where there are malicious nodesor failure nodes, reputation systems, which evaluate the per-formance of nodes to exclude the malicious nodes from thenetwork, have been widely discussed. Our proposed methodseems to be a kind of reputation systems since each mobilenode shares the information about the malicious nodes andexcludes the malicious nodes from MANETs.

In the field of sensor networks and MANETs, many repu-tation systems considering the reliability of nodes in the net-work have been proposed [3], [13], [14], [17], [21], [22], [24].In [13], each node calculates the local reputation scores ofother nodes from correctness of received files, and floods thescore information in the network. Then, each node calculatesthe global reputation score from its own and received localscores. At last, it determines the node whose global score islower than a threshold as the malicious nodes. In [3], [21],and [22], the authors have proposed methods in which eachnode manages the reputation values of its neighboring nodesin MANET. In these methods, each node overhears messagessent by neighboring nodes and determines the reputationscore of neighboring nodes by analyzing their messages.However, these methods do not assume that the maliciousnodes send false reputation scores.

In [7] and [8], the authors have proposed methods againstfalse notification attacks in reputation systems. Especially,in [8], source nodes exchange a cryptographic key with des-tination nodes in advance, and send their own ID, alongwith the past and current reputation scores of destinationnodes, in encrypted form. The destination node decodes andconfirms the received reputation scores. Thus, it can discard

VOLUME 4, 2016 995




the false reputation scores. However, these methods assumeinformation sharing only between source and destinationnodes, whereas in our method nodes share information withall other nodes in the network.

III. ASSUMPTIONSThe system environment is assumed to be a MANET con-structed by mobile nodes held by members of a highly impor-tant collaborative work such as rescue operations andmilitaryaffairs in which the members issue top-k queries to efficientlyacquire data items. In a case of rescue operations, ambulanceclews need to pick up victims in a critical condition. Theattackers such as terrors hack a node which an ambulanceclew holds because the attackers aim to spread the damagefor a long time. The ambulance clew whose node has beenhacked does not recognize that his/her own node has beenhacked, and themalicious node sends data itemswhich he/shedoes not intend.

A. SYSTEM MODELThe set of all mobile nodes in the system is denoted byM = {M1,M2, . . . ,Mn}, where n is the total number ofmobile nodes and Mi(1 ≤ i ≤ n) is a node identifier. Theset of all data items in the entire network is denoted byD = {D1,D2, . . . ,Dd}, where d is the total number of dataitems and Di(1 ≤ i ≤ d) is a data identifier. Each data item isretained by a specific node. Since we assume a highly impor-tant collaborative work, highly secure communications anddata exchanges are essential. Therefore, we assume that eachnode has the public key of all nodes in the network. Whena node replies with data items (i.e., sends a reply message),it encrypts the data items using the public key of the des-tination node to avoid intermediate nodes modifying andreading the data items. In addition, the node sends the replymessage (including the encrypted data items) to some ofits neighbors after encrypting the message using the neigh-bors’ public key. This is to ensure a secure communicationwith neighbors and avoid others overhearing the message.On the other hand, when a node sends (or relays) a query,it broadcasts the query without encryption since a query isnot aimed to send to specific nodes but should be sent to allneighbors.

The scores of data items can be calculated based on thequery condition and specified scoring functions. Raw dataitems are generated from some special devices and softwares(which are independent of the mobile nodes’ OS and applica-tions) such as medical sensors, which can be read but cannotbe modified even by the owner nodes. Therefore, even if anode is hacked by attackers, it cannot modify its own dataitems, i.e., a malicious node cannot generate incorrect andfake data items whose scores are unfairly high. This assump-tion is also to achieve highly secure collaborative work.In order to acquire data items with the k highest scoresin a MANET, each intermediate node should selectivelysend data items with higher scores. Therefore, the scoresof reply data items in a reply message are not encrypted,

with the public key of the query-issuing node, i.e., eachmobile node can know the scores of data items in the replymessage.

B. ATTACK MODELIn this paper, we assume that the number of the maliciousnodes in the network is m. A malicious node seeks a wayto disrupt the query-issuing node’s acquisition of the globaltop-k result, without being detected. If the malicious nodefalsifies the scores of its own data items or that of others’own data items when relaying them, the query-issuing nodecan easily detect the attack by comparing received data items’scores (attached directly with the reply message) with thescores calculated from the received data items. Thus, in thispaper, we assume that malicious nodes attempt only a DRA intop-k query processing. When a malicious node does DRA, itrandomly replaces dh · ke (i.e., h denotes the rate of replace-ment) data items in the local top-k result with its own dataitems, which have lower scores than the local top-k result.

Moreover, we assume that each node floods the entirenetwork with the information about the identified maliciousnodes in order to share it with other nodes. Aiming to confusenormal nodes and make them misjudge the malicious nodeidentification, a malicious node does a FNA, where it notifiessome normal nodes as malicious nodes. If each maliciousnode randomly notifies normal nodes as malicious nodes, theFNA is easily detected by other normal nodes because onlythis node claims these normal nodes as malicious ones inmost cases. Therefore, multiple malicious nodes collaborateto notify same normal nodes. This type of FNA has moreinfluence in malicious node identification.

As mentioned, we assume that malicious nodes do twotypes of attacks (i.e., DRA and FNA), but do not always dothem, i.e., they sometimes do only one type of attack andsometimes do both.

IV. PROPOSED METHODA. OVERVIEWIn our proposed top-k query processing method, the query-issuing node first floods a query over the entire network, andeach node receiving the query stores information on all pos-sible routes to the query-issuing node. Then, each receivingnode replies with data items with the k highest scores to twoneighbor nodes. In addition, each node includes, in its replymessage, information on the reply message forwarding routeswhich consist of pairs of sender node and next node IDs.Based on this attached information, the query-issuing nodecan detect an attack occurring along a reply message route.In MANETs, since the network topology dynamicallychanges due to the mobility of nodes, radio link disconnec-tions can occur between nodes. Therefore, if a node detectsa radio link disconnection along one of its two reply routes,it sends back the data items to a different node, to which thosedata items have not yet been sent, to ensure that they are sentback along two different routes.

996 VOLUME 4, 2016




In our proposed malicious node identification method,a query-issuing node that detects a DRA narrows down themalicious node candidates based on the received reply mes-sages. Then, the query-issuing node determines whether agiven reply message sent back by a malicious node candidateincludes replaced data items or not, by sending inquiries tonodes receiving reply messages from this candidate. In thisway, the query-issuing node can identify the malicious node.Here, each node tends to identify the neighboring mali-cious nodes, but hardly identify the malicious nodes whichare far from it. Therefore, nodes share the information onidentified malicious nodes to detect the malicious nodesquickly. Specifically, after each node identifies maliciousnodes, it floods the information on the identified maliciousnodes within the network. When each node has receiveda certain number of queries, it performs malicious nodeidentification procedures based on the received information.Specifically, it divides nodes into relevant groups based onsimilarities of the information on malicious nodes detectedby those nodes, and then identifies malicious nodes based onthe results of malicious node identifications by these groups.

B. TOP-k QUERY PROCESSING1) QUERY FORWARDINGFirst, the query-issuing node floods a query over the entirenetwork. The query consists of the node identifier of thequery-issuing node (Query-issuing nodeID), the query iden-tifier of the query (Query ID), the number of requested dataitems (k), the query condition, and a list of the node identifiersof nodes on the path along which the query message is tobe transmitted (Query path). Specifically, the query-issuingnode, Mp, specifies the query condition and the number ofrequested data items, k . Then,Mp transmits a query messagewhose Query path includes its identifier, Mp, to its neighbornodes. A node, Mq, which receives the query, transmits itaccording to Algorithm 1. In Algorithm 1, hop count denotesthe number of hops to the query-issuing node, based on thenumber of nodes included in the Query path. Then, Mq setsa waiting time for reply (RD) according to the followingequation:

RD = (hopmax − hopcnt ) · Twait (1)

where hopcnt denotes the number of hops to the query-issuingnode, hopmax denotes the maximum number of hops (calcu-lated based on the area size of the network and the radio rangeof nodes), and Twait is a positive constant. In this equation, ashopcnt increases, RD decreases. WhenMq receives the querylater again, it stores the ID of the query sender node as itsneighbor node, as well as, the Query path and the number ofhops (Lines 10-11 in Algorithm 1).

2) REPLY FORWARDINGWhen RD has passed, each node sends back a reply message,which includes its own node identifier (Sender node ID),the identifier of the next node along the reply route(Dest node ID), a list of the data items (including their scores)

Algorithm 1 Forwarding a Query1: /* Receive a query message */2: if Mq receives a query for the first time then3: Store Query path and hop counts as its Parent Query

path4: Store the node ID at the end ofQuery path as its parent5: Set RD for replying data items6: /* Send the query message to neighbor nodes */7: Add M ′qs node ID to the end of Query path8: Send the query to neighbor nodes9: else10: StoreQuery path and hop count as its Neighbor Query

path11: Store the node ID at the end of Query path as its

neighbor12: end if

and the node identifiers of the nodes possessing them (Datalist), and a list summarizing the reply message routes, i.e., alist of the pairs of sender and next node identifiers (Forward-ing Route).

In Algorithm 2, node Mr sends a reply message whenits RD has passed. Here, REP denotes a reply messageand REP. FR denotes the forwarding route list consistingof (Sender node ID, Dest node ID), which denotes the listof sender and next node identifier pairs, and R denotes themaximum number of reply messages to be re-sent.Mr selectsthe next node from its neighboring nodes, which has the leasthop count and least overlap between its Query path and theparent node’s Query path (Line 9 in Algorithm 2).

3) LINK DISCONNECTIONIn MANETs, the network topology changes dynamically dueto the movement of nodes. When a radio link disconnectionto the parent node or next node occurs, a replying node, Mr ,cannot send a reply message, resulting in reduced accuracy ofthe query result. Therefore, if a node sends a reply messageR times but does not receive an ACK from the parent or nextnode, the sending node detects a radio link disconnection;at which point the node sends the reply message to anotherneighbor node among those whose routes to the query-issuingnode include the least overlap between Query path in thereplay message and their own Query path (Lines 27-30 inAlgorithm 2). If the sending node has no neighbor nodeswhich satisfy this condition, it sends the reply message toa neighbor selected in the same way as in selecting thenext node in ‘‘Reply Forwarding’’ process among nodeswhich have not been sent the reply message (Lines 35-36in Algorithm 2).

4) DETECTING ATTACKSAfter the query-issuing node, Mp, receives all the replymessages, it detects a DRA according to Algorithm 3.In Algorithm 3, Top-k Result denotes the data items with

VOLUME 4, 2016 997




Algorithm 2 Sending a Reply Message1: /* Sends a reply message after RD has elapsed */2: /* Select a node to send a reply message */3: for each Neighbor do4: if Neighbor’s hopCount is the minimum then5: Insert Neighbor into DestNode6: end if7: end for8: if |DestNode| > 1 then9: Select a Neighbor whose Neighbor Query path least

overlaps with the parent Query path as a DestNode10: end if11: Add the local top-k result to REP12: for i = 0 to 1 do do13: if i = 0 then14: Add (Mr , parent node) to received REP.FR and send

REP to parent node15: else if i = 1 then16: Add (Mr , DestNode) to received REP.FR and send

REP to DestNode17: end if18: end for19: /* Receive a reply message */20: Send ACK to the sender node of REP21: if before RD then22: Store REP23: else if after RD and Mr receives a data item with higher

score than with the kth-highest score among data itemsalready sent then

24: Send REP including new local top-k result to parentnode and DestNode

25: end if26: /* Resend the reply message */27: if Mr does not receive ACK from its parent by waiting

time for retransmission and the number of retransmis-sions < R then

28: Resend REP to parent29: else ifMr does not receive ACK fromDestNode by wait-

ing time for retransmission and the number of retrans-missions < R then

30: Resend REP to DestNode31: else if the number of retransmissions > R then32: /* Mr detects the disconnection of radio link */33: if Mr has sent REP to all Neighbor then34: Discard REP35: else if Mr knows a Neighbor whose Neighbor Query

path includes DestNode then36: Send REP to the Neighbor37: else38: Select randomly a Neighbor among Neighbors

which have not been selected yet39: Send REP to the Neighbor40: end if41: end if

Algorithm 3 Detection of Attack1: /* After the query-issuing node receives all reply mes-

sages */2: INPUT: Top-k Result, REPs3: OUTPUT: SendRoute4: SendRoute← ∅5: for each REP do6: for each Top-k Result do7: if REP.FR includes the node ID of a node processing

a data item in Top-k Result and REP.Data does notinclude the data item then

8: Insert a route from the node with themissing dataitem to the query-issuing node into SendRoute

9: end if10: end for11: end for12: if SendRoute 6= ∅ then13: Detect Attack14: end if

the k highest scores, acquired by the query-issuing node,REP.Data and REP.FR respectively denote the data list andforwarding route included in the reply message, REP, andSendRoute denotes the set of node identifiers along the routefrom the node possessing a given data item to the query-issuing node (the query-issuing node can know the SendRoutefrom the forwarding route information). If the nodes whichhave data items in the top-k result are included in SendRoute(or REP.FR), but the data items in the top-k result arenot included in REP.Data, the query-issuing node detects aDRA (Line 13 in Algorithm 3), and initiates the maliciousnode identification process described in Section IV-C1. If thequery-issuing node does not detect a DRA, it completes thetop-k query processing.

FIGURE 2. Example of reply forwarding.

Fig. 2 shows an example in which the query-issuingnode, M1, where k = 3, detects a DRA. Table 1 shows thescores of data items retained by each node. Here, the mali-cious node, M2, replaces data items having the highest and

998 VOLUME 4, 2016




TABLE 1. Scores of data items retained by each node.

Algorithm 4 Narrowing Down the Malicious NodeCandidates1: INPUT: SendRoute2: OUTPUT: Candidate3: /* The query-issuing node confirms whether the reply

message included replaced data items */4: for each node ID in SendRoutes do5: if node ID is included in SendRoute then6: Insert node ID into Candidate7: end if8: end for9: if |Candidate| = 1 then10: return Candidate as a Malicious Node11: else if |Candidate| > 1 then12: /* Inquire to the candidates of a malicious node */13: Perform the procedure of Algorithm 514: end if

second highest scores among received data items(score: 98, 91), with data items it possesses, whose scores arelower than the third highest data item it received, and nowsends the corrupted data items (score: 86, 72, 65). Receivingthe reply message from M2, M1 can know that M5 and M6,which have data items included in the top-k result, are onthe forwarding route included in the reply message fromM2.However, the data items received from M2 do not includedata items of the top-k result which M5 and M6 possess(score: 98, 91). Therefore, the query-issuing node detectsthat the data items it received from M2 have been corrupted(attacked), and knows precisely which data items have beenreplaced.

C. MALICIOUS NODE IDENTIFICATION METHOD1) LOCAL IDENTIFICATIONAfter detecting a DRA, the query-issuing node tries toidentify the malicious nodes. In Algorithms 4 and 5, thequery-issuing node narrows down the candidates for mali-cious nodes, and identifies the malicious nodes by makingrespective inquiries. In our proposed method, according toAlgorithm 4, the query-issuing node narrows down themalicious node candidates by using SendRoute, obtained inAlgorithm 3. In Algorithm 4, Candidate denotes the setof node identifiers of malicious node candidates, orderedby ascending hop count from the query-issuing node, andmissing Top-k result denotes the replaced data items. Thenodes included in SendRoute, whose data items are cor-rupted (by the malicious node), are all possible attackers.

Algorithm 5 Identification of a Malicious Node1: INPUT: Candidate2: OUTPUT: MaliciousNode3: /* Mp starts to inquire */4: for each i in Candidate.size do5: if InqRoute include other candidates in Candidate

then6: /* End procedure without inquiring */7: break8: else if hop count to Candidate > 1 then9: /* Send an inquire message */10: SendMNI-INQ toMdesti to ask data items that Can-

didate[i] sent11: end if12: /* Mv receives an inquire message */13: if Mv receives MNI-INQ then14: Send MNI-INQ to the next node of Mv in InqRoute15: end if16: /* A malicious node candidate receives an inquire

message */17: if Mdesti receives MNI-INQ then18: SendMNI-IREP including scores of data items sent

by Candidate[i] to Mp19: end if20: /* Mu receives a reply message for the inquiry */21: if Mu receives MNI-IREP then22: Send MNI-IREP to sender MNI-INQ23: end if24: /* Mp receives a reply message for the inquiry */25: if Mp receives MNI-IREP then26: /* the query-issuing node identifies the malicious

node */27: if scores includes the score of the missing data items

in global Top-k result then28: return Candidate[i− 1]29: end if30: end if31: end for

Therefore, the query-issuing node recognize these nodes asmalicious node candidates. When the number of maliciousnode candidates is one, the query-issuing node identifies thisnode as the malicious node and completes the procedure(Line 10 in Algorithm 4).

Algorithm 5 shows the procedures for inquiring aboutinformation on data items sent from malicious node candi-dates. Here, MNI-INQ denotes an inquiry message, whichcontains the query-issuing node identifier, the node iden-tifier of the destination node for the inquiry message(Mdesti ), the set of malicious node candidate identifiers(Candidate), and the forwarding route of the inquiry mes-sage from the query-issuing node to the destination node(InqRoute). Mdesti denotes the destination node to whichCandidate[i] (ith candidate) has sent a reply message.MNI-IREP denotes a message sent in reply to the inquiry

VOLUME 4, 2016 999




message, which contains the scores of the data items, and theidentifiers of nodes possessing these data items, which areincluded in the reply message received from theCandidate[i]node.

In ascending hop count of the malicious node candidates.The query-issuing node, Mp, tries to successively identifythe malicious node. Specifically, it determines InqRoute foreach malicious node candidate so that it does not include themalicious node candidate, and sends an inquiry message tonodes at the top of the InqRoute A MNI-INQ message is notsent to nodes whose hop count is one, because the query-issuing node receives reply messages directly from suchnodes (Line 7 in Algorithm 5). AfterMp has received a replymessage to its MNI-INQ, it identifies the malicious nodes.Specifically, if the data items sent by Candidate[i] do notinclude the replaced data items, Mp identifies the candidatewith a hop count of one less than that of Candidate[i](i.e., Candidate[i − 1]), as the malicious node (Line 28 inAlgorithm 5), and completes the procedure.

FIGURE 3. Identification of the malicious node.

Fig. 3 shows an example of how the query-issuing nodeidentifies the malicious node after detecting a DRA, in thecases where the data items included in the top-k resultpossessed by M5 and M6, are not included among the dataitems in the reply message from M2. The query-issuingnode,M1, determines each route along which a reply messageis transmitted from M5 or M6 to M1 via M2 by checking thereply message received from M2. M1 designates M2 and M4as the malicious node candidates, since they are included inthe SendRoute. Therefore, M1 sends a MNI-INQ message toM3, since malicious node candidate,M4, has sent a reply mes-sage to M3, and M3 is not itself a malicious node candidate.Since the hop count of M2 is one (i.e., M1 receives repliesdirectly fromM2),M1 does not send a MNI-INQ toM2.WhenM3 receives theMNI-INQ message, it sends the scores of thedata items (score: 98, 91, 86) sent to it by M4, to the query-issuing node, M1. When M1 receives this reply message,it confirms whether M4 has sent data items included in thetop-k result (score: 98, 91). Finally, since M4 has sent thecorrect data items and M2 has not, M1 identifies M2 asthe malicious node.

2) SENDING NOTIFICATION MESSAGESAfter identifying the malicious nodes, the query-issuing nodefloods the information on the identified malicious nodeswithin the network. More specifically, the query-issuingnode, Mp, sends a notification message to its neighboringnodes. The notification message contains the query identifierof the query (QNum), the node identifier of the query-issuingnode (Mp), and the list of the node identifiers of the iden-tified malicious nodes (BLp). The node, which received thenotification message, stores the message, and also forwardsit to the neighboring nodes. The node, which received thesame notification message again, ignores the message, andalso forwards it to the neighboring nodes. Hence, all nodesshare the information on the identified malicious nodes in thenetwork.

3) GLOBAL IDENTIFICATIONIn our method, each node individually identifies maliciousnodes using the shared information by the two steps; nodegrouping and malicious node identification.Node Grouping: Each node divides nodes in the network

into some groups based on the information in the notificationmessages received by the nodes, according to Algorithm 6.In Algorithm 6, each node starts this process (i.e., group-ing) after receiving NumQuery queries. Ri(i = 1, 2, . . . , n)denotes the evaluation score byMi, which is represented by ann-dimensional vector, and indicates the malicious nodesidentified by Mi. More specifically, the j-th elementof Ri (j = 1, 2, . . . , n) is set to 1 when Mi identified Mj asthe malicious node, and 0 otherwise. sim(a, b) denotes thesimilarity of evaluation scores between Ma and Mb. Groupdenotes groups determined by node grouping, Gcan denotescandidates of groups, and Groupg denotes the g-th groupin Group. BLg denotes malicious nodes identified by nodesin Groupg, Mg,e denotes a node in Groupg and CountBLg,fdenotes the number of nodes which identify Mf included inBLg as a malicious node among nodes in Groupg. θ denotesthe threshold for the grouping, and ρ denotes the thresholdfor the cleaning, which is represented by ρ = |Groupg| · α(Here, |Groupg| denotes the number of nodes in Groupg, andα denotes a system parameter (0 ≤ α ≤ 1)).First, each node calculates the similarity of nodes in terms

of identified malicious nodes based on the received notifi-cation messages. In order to decrease the influence of dif-ferences in the number of identified malicious nodes amongnodes, we adopt cosine similarity for similarity calculation(line 7 in Algorithm 6). After the node grouping, some groupsmay include both normal and malicious nodes. Therefore, thenode performs a cleaning in each group to remove the incon-sistency. Specifically, if a node, Mg,e, in a certain Groupg,identifies another node in the same group as amalicious node,Mg,e is eliminated from Groupg (line 28 in Algorithm 6).After that, a node identifying another node which is identifiedby less than a certain number of nodes in the same group,is also eliminated from the group (line 32 in Algorithm 6).

1000 VOLUME 4, 2016




Algorithm 6 Node Grouping1: /* After receiving queries NumQuery times */2: INPUT: Ri (i = 1, 2, ..., n)3: OUTPUT: Group4: /* Calculate the similarity between nodes */5: for each a ∈ n do6: for each b ∈ n do7: sim(a, b) = cos(a, b) = Ra·Rb

‖Ra‖‖Rb‖8: end for9: end for10: /* Node grouping*/11: for each a ∈ n do12: for each b ∈ n do13: if Malcan = ∅ and sim(a, b) ≥ θ then14: Insert Ma,Mb into Gcan15: else if Malcan 6= ∅ and {∀x ∈ Malcan, sim(x, b) ≥

θ} then16: Insert Mb into Gcan17: end if18: end for19: if Gcan /∈ Group then20: Group← Gcan21: end if22: clear Gcan23: end for24: /* Cleaning in each group */25: for each Group do26: for each Mg,e in Groupg do27: if Mg,e identified a node include in Groupg then28: eliminate Mg,e from Groupg29: end if30: for each BLg,f in BLg do31: if CountBLg,f ≤ ρ and Mg,e identifies BLg, f

then32: eliminate Mg,e from Groupg33: end if34: end for35: end for36: end for

By doing so, even if a malicious node notifies a false messagewhich includes fake information on the identified nodes (i.e.,cleaning normal nodes as malicious nodes) as well as thesame information as other normal nodes to achieve highsimilarity, the malicious node and the fake information canbe eliminated from the group.

Table 2 shows an example of notification messages.In Table 2, the first row shows nodes sending the notificationmessages, (i.e., M1, . . . ,M10), and the second row showsmalicious nodes identified by the nodes of the first row.Table 3 shows the similarities between each pair of nodes,which are calculated based on the messages in Table 2.In this example, let us assume that M2,M5 and M8 aremalicious nodes. In the case that two nodes notify the samenodes as the malicious nodes, the similarity between these

TABLE 2. Example of notification messages.

TABLE 3. Similarity of the information on identified malicious nodesbetween each pair of nodes.

nodes is 1. When θ = 0.7, nodes are firstly divided intoG1 = {M1,M3,M4,M6,M7,M10}, G2 = {M1,M4,M8},G3 = {M2,M5}, and G4 = {M3,M6,M7,M9,M10}. In G2,because M8 identifies M1 as the malicious node, M8 is elim-inated from G2. Therefore, G2 = {M1,M4}. When α is setto 0.2, ρ is 1.0 inG4. SinceM8 is identified by onlyM9,M9 iseliminated from G4, i.e., G4 = {M3,M6,M7,M10}.Malicious Node Identification: After the node grouping,

each node conclusively determines malicious nodes basedon the information about malicious nodes identified bynodes in each group. Here, there are three types of groups,i.e., a group composed of (i) only normal nodes, (ii) onlymalicious nodes, and (iii) both normal and malicious nodes.1

The nodes identified as malicious by all nodes in a group of(i) or (iii) are surely malicious nodes. Only in a group of (iii),normal nodes can be identified as malicious by all nodeswhich collaboratively attack on FNA. Here, since maliciousnodes are generally minorities in the entire network, majoritybased judgment (and pruning) works well for malicious nodeidentification. Therefore, in our method, nodes are confirmedto be malicious when they are determined to be maliciousby a number of groups equal to or larger than a certainthreshold.

Each node determines malicious nodes according toAlgorithm 7. Here, Malicious denotes conclusive maliciousnodes and Mal denotes nodes identified as malicious byall the nodes in each group. Mx denotes a node notifiedas a malicious node and CountMx denotes the number ofgroups where all the nodes identify Mx as a malicious node.

1Note that we cannot surely know which type each group is categorized.

VOLUME 4, 2016 1001




Algorithm 7 Global Identification1: /* After grouping nodes in the network */2: INPUT: Group3: OUTPUT:Malicious4: /* Calculate the malicious nodes identified by all nodes

in each group */5: while All CountMx < φ do6: All CountMx and Mal clear7: for each Group do8: if Mx is identified by all nodes in Groupg then9: Mal ⇐ Mx10: CountMx ++

11: end if12: end for13: /* Identified the malicious nodes */14: for each My in Mal do15: if CountMy ≥ φ then16: Malicious⇐ My17: end if18: end for19: for eachMalicious do20: for each Group do21: if Maliciousm ∈ Groupg then22: Delete Maliciousm from Groupg23: end if24: end for25: end for26: end while

φ denotes the threshold for each node to conclusively deter-minemalicious nodes, which is included by φ = GroupNum·β(0 ≤ β ≤ 1) where, GroupNum is the number of groupsand β is a system parameter. In this method, if the numberof groups, in which all nodes identified Mx(Mx ∈ Mal)as a malicious node, is more than the threshold, φ, Mx isconclusively determined as a malicious node and added toMalicious (line 15 in Algorithm 7).

Table 4 shows an example of conclusively determin-ing malicious nodes. The first row of Table 4 showsthe group identifier, second row shows the query-issuingnodes included in each group, and third row shows Mal.

TABLE 4. Example of identified malicious nodes.

When β is set to 0.3, the threshold φ becomes 1.2. Accordingto Table 4, because M5 is included in three groups, it isconclusively determined as a malicious node. Then, M5 iseliminated from G3. On the other hand, sinceM1,M2 andM4are respectively identified by only one group, they are notdetermined as malicious node.

V. DISCUSSIONA. CASES OF NOT DETECTING A DRAIn our top-k query processing method, each node sends backdata items to two neighbor nodes, and the query-issuing nodesuccessfully acquires data items in the top-k result, even if oneof the routes includes a malicious node, because the alternateroute can safely ensure that the required data items are prop-erly sent back. However, especially when the node density inthe network is low, some nodes may not have multiple neigh-bor nodes, and can send back data items along only one route.If data items are sent through amalicious node on the singularroute or all two nodes on multiple paths are malicious, thequery-issuing node will not acquire data items replaced bythe malicious node. Moreover, in our proposed method, thequery-issuing node can detect attacks only when it receivesreply messages from multiple nodes. For example, when thequery-issuing node has only one neighbor node, it cannotdetect attacks. In Fig. 4 (a), the query-issuing node, M1,receives a reply message only from M2, and thus cannotrecognize the DRA.

On the other hand, depending on the given network topol-ogy, malicious nodes are sometimes unable to replace therequested data items and cannot disrupt the acquisition ofthe top-k result. In Fig. 4 (b), for example, since the mali-cious node, M4, does not receive reply messages from anyother nodes, it cannot attack, and the query-issuing node canacquire the correct top-k result. In this case, the maliciousnode sends normal reply messages, because other nodes may

FIGURE 4. Cases in which attacks cannot be detected. (a) Unrecognized attack. (b) Disabled attack. (c) Ineffective attack.

1002 VOLUME 4, 2016




recognize the node as malicious if it ignores the message.In Fig. 4 (c), though the malicious node, M4, replaces dataitems, the corrupted local top-k result is not included in theglobal top-k result. Thus, the global top-k result is not affectedby the DRA. Of course, in these cases, though the query-issuing node can acquire the data items in the global top-kresult, it cannot detect the DRA or identify the maliciousnode.

B. INFLUENCE OF FNAIn the global identification method, each node identifiesmalicious nodes based on the shared information on the iden-tified malicious nodes. However, the malicious nodes attemptFNAs to configure normal nodes and make them misjudgethe malicious node identification. In addition, to disturb theidentification, some malicious nodes may do only FNAs(we call them liar nodes). In this section, we discuss theinfluence of FNA.

First, when a malicious node notifies the information ona randomly selected node as a malicious node, there is littleinfluence of the identification. This is because only few mali-cious nodes claim the same normal node as malicious whileother (many) nodes do not. Therefore, even by a majority-basedmethod, where each node identifies a node as maliciouswhen the number of nodes identifying it is more than athreshold, the malicious nodes are substantially identified.Our proposed method can also defend against FNAs, sincethe similarity among normal nodes and malicious (or liar)nodes is low, i.e., malicious nodes have little possibility tobe classified into the same groups with normal nodes, andthe number of groups consisting of normal nodes is generallymuch more than that of malicious nodes. Therefore, thereis little possibility to determine normal nodes as maliciousnodes.

Next, some malicious and liar nodes may collaborativelyclaim the same normal node as a malicious node. In this case,because the number of nodes which notify a normal node asa malicious node becomes large, by a single majority-basedmethod, the normal node may be conclusively determinedas a malicious node. Here, it should be noted that normalnodes tend to identify near-by malicious nodes, and thus,the identified malicious nodes have some diversity amongthem. In our proposed method, since nodes which identifythe same nodes as malicious are usually classified into thesame groups, the number of groups includingmalicious nodeswhich have done FNAs is small. Therefore, the misidenti-fication caused by FNAs is less happened than that in thesimple majority method. Moreover, malicious nodes may tryto increase their similarity with other nodes (for example,by announcing information that includes nodes identified bynormal nodes as malicious), in order to increase the numberof groups that include them. In our proposed method, onlynodes identified by all nodes in each group are decided asmalicious nodes identified by the group. Therefore, there islittle possibility to conclusively determine normal nodes asmalicious nodes.

VI. SIMULATION EXPERIMENTIn this section, we discuss the results of the simulation experi-ments conducted to evaluate the performance of our proposedmethods. For the simulation experiments, we used a networksimulator, QualNet5.2.2

A. SIMULATION MODELThe number of mobile nodes in the entire system is 50(M1,M2, . . . ,M50). These mobile nodes exist in an areaof 500[m] × 500[m] and move according to the randomwaypoint model [4], with the speed and pause time setat 0.5 [m/sec] and 30 [sec], respectively. The initial positionis randomly determined. Each mobile node transmits mes-sages and data items using IEEE 802.11b device whose datatransmission rate is 11 [Mbps]. The transmission range ofeach mobile node is roughly 100 [m]. Each mobile node has50 data items, whose size is 128 [B]. The score of each dataitem is randomly determined from a range of 1 to 999. Themaximum number of times of resending data items, R, is 3.m malicious nodes (denoted by ‘‘MN ’’), which do both

DRA and FNA, and l liar nodes (denoted by ‘‘LN ’’), whichdo FNA only, are randomly determined among all the nodesin the network. A malicious node replaces d0.5 · ke dataitems, received from other nodes, with its own data itemsas DRA. When a node does a FNA, it floods a notificationmessage including the information on same normal nodesand the malicious node which is included the first receivednotification message from a node. After all, all nodes doingFNAs basically notify the same normal node as a maliciousnode but they do not notify themselves.

TABLE 5. Parameter configuration.

Table 5 shows the parameters used in the simulation exper-iments, and their values. These parameters are basically fixedto the constant values to the left of the parenthetical values,and each is varied over the range specified in the parenthesisin a simulation experiment. In the following, we evaluate ourproposed top-k query processing method and malicious nodeidentification method; local identification in Section VI-Band global identification in Section VI-C.

B. TOP-k QUERY PROCESSING ANDLOCAL IDENTIFICATIONWe compare the performance of our proposed top-k queryprocessing method with that of the naive method. In thenaive method, the query-issuing node floods a query over theentire network, and a node receiving the query sends the local

2Scalable Network Technologies: Creators of Qualnet Network SimulatorSoftware, <http://www.scalable-networks.com>

VOLUME 4, 2016 1003




top-k result only to its parent node. We evaluate thefollowing criteria for each method where the query-issuingnode is randomly selected every 30 [sec] and this process isreported 1,000 times.• Accuracy of the query result: the average ratio of thenumber of data items included in top-k result, which areacquired by the query-issuing node to k .

• Traffic: the average of the total traffic volume requiredfor processing a top-k query and that for identifying themalicious nodes. Table 6 shows each message size inour proposed and naive method. In Table 6, in our pro-posed method, i denotes the number of node identifiersincluded in Query path attached to the query message.j denotes the number of pairs of sender and next nodeincluded in the forwarding route list attached to the replymessage, and l denotes the number of nodes on the routealong which the inquiry message is sent.

TABLE 6. Message types and their size.

• Malicious Node Identification Ratio: the ratio of thenumber of malicious nodes identified by the query-issuing node for a single query in our proposed methodtom. In the naive method, this ratio is always 0 since thequery-issuing node cannot detect an attack.

1) IMPACT OF THE NUMBER OF REQUESTED DATA ITEMSWe examine the effect of the number of requested dataitems, k . Fig. 5 shows the simulation result. In these graphs,the x-axis indicates k , and the y-axis indicates the accuracy ofthe query result in Fig. 5 (a), the traffic in Fig. 5 (b), and themalicious node identification ratio in Fig. 5 (c). In the graphsof the accuracy of the query result (Figs. 5 (a) and 6 (a)),‘‘Attack’’ means cases that the malicious nodes performedeffective DRAs (i.e., the cases that the query-issuing nodecan detect DRAs or cannot detect DRAs like Fig. 4 (a)),and ‘‘no-Attack’’ means cases that DRAs of maliciousnodes did not affect the query result (i.e., the cases shownin Figs. 4 (b) and (c)).3

From Fig. 5 (a), as k increases, the accuracy of the queryresult typically decreases in both methods, because packetlosses increases with the increase in size of replies containingk data items. In the case of ‘‘Attack’’ in the naive method, theaccuracy of the query result becomes zero when k equals 1.This is because the query-issuing node never acquires thedata item with the highest score, since the malicious nodes

3Note that even in these cases, the query-issuing node sometimes cannotacquire the global top-k result due to packet losses.

FIGURE 5. Effect of k . (a) Accuracy of the query result. (b) Traffic.(c) Malicious node identification ratio.

replace this data item in all cases. In our proposed methods,though malicious nodes attack, the accuracy of the queryresult remains high, because data items are sent back alongmultiple routes. Even if a data item is lost on a route dueto packet loss, it can be sent back to the query-issuing nodealong another route.

From Fig. 5 (b), as k increases, traffic increases because ofthe increase in reply message size. In the proposed method,the traffic is much larger than in the naive method, becauseeach node sends reply messages only to its parent node in thenaive method, but to two nodes in our method. In fact, in theproposed method, the traffic is more than twice as large asin the naive method. This is also due to more often re-sentmessages due to packet loss. Meanwhile, the traffic requiredto identify malicious nodes in the proposed method is littlebecause the MNI-INQ messages do not include data items(only data scores). Thus, the query-issuing node can identifymalicious nodes with much less traffic than is required by itsacquisition of the query result.

From Fig. 5 (c), in our proposed method, regardless of k ,the ratio that the query-issuing node did not identify themalicious nodes is more than 50%. Here, this includes cases(about 20%) the query-issuing node did not detect DRAssince the attack are not effective as mentioned in Section V-A.That is the ratio that the query-issuing node detected DRAsbut did not identify the malicious node is about 30%. Thisis because the query-issuing node sometimes can not receiveMNI-IREP due to packet loss. Moreover, whenmultiple mali-cious nodes are present in the network, the query-issuing nodemay not succeed in making identification inquires, becausemalicious nodes are present on the inquiry path. However,from Fig. 5 (a), our proposed methods can maintain the accu-racy of the query result, and local identification is effective

1004 VOLUME 4, 2016




because the query-issuing node identifies some maliciousnodes (about 50%) at a single query. As k increases, theidentification ratio increases. This is because, as k increases,the number of data items in the global top-k result replaced bymalicious nodes (i.e., d0.5 ·ke) increases, and thus, occasionsto detect the attacks also increase.

2) IMPACT OF THE NUMBER OF MALICIOUS NODESWe examine the effect of the number of malicious nodes, m.Fig. 6 shows the simulation result. In these graphs, the x-axisindicatesm, and the y-axis indicates the accuracy of the queryresult in Fig. 6 (a), the traffic in Fig. 6 (b), and the maliciousnode identification ratio in Fig. 6 (c).

FIGURE 6. Effect of m. (a) Accuracy of the query result. (b) Traffic.(c) Malicious node identification ratio.

From Fig. 6 (a), in our proposed method, regardless of m,the accuracy of the query result is higher than in the naivemethod. This is because, in our proposed method, each dataitem is sent back along multiple routes, so that the query-issuing node can acquire the required data items.

From Fig. 6 (b), regardless of m, the traffic is almostconstant in both methods because a DRA has no influenceon the number of data items in a reply message. In ourproposed method for identification, regardless of m, there islittle traffic.

From Fig. 6 (c), as m increases, the ratio that no mali-cious node was identified decreases. This is because, asm increases, the opportunity of DRA increases, and then thequery-issuing node has more chances to detect the attacks.Here, even if m is large, it is difficult to identify more thanthree malicious nodes at a single query. This is because DRAsare sometimes not detectable, as mentioned in Section V-A.Moreover, when replaced data items are replaced again byother malicious nodes, the previous attack is not detectable.In addition, it sometimes happened that a query-issuing node

cannot inquire about targeted malicious node candidates,because there are other malicious node candidates on thepaths to the nodes who have received reply messages fromthe targeted candidates

C. GLOBAL-IDENTIFICATIONWe compare the performance of our proposed maliciousnode identification method with that of the simple majority-based method (denoted by majority method). In the majoritymethod, each node determines the node identified by thelargest number (≥ λ) of nodes as a malicious node. Then,it discards the information sent by the determined maliciousnode, and the same procedure among nodes except for themalicious node repeats until there is no node which is identi-fied as malicious by equal to or larger than λ nodes. Here,λ is a threshold and calculated by λ = IMrec · γ , whereIMrec denotes the total number of nodes issuing notificationmessages and γ denotes a system parameter. Table 7 showsthe threshold setting in the methods for global identification.

TABLE 7. System parameters regarding thresholds.

We evaluate the following criteria for each methodwhere each node performs global identification after receiv-ing queries NumQuery times and this process is repeated100 times.• Number of identified malicious nodes: the average num-ber of identified malicious nodes among 100 times ofglobal identification.

• Rate of misidentification: the rate that a normal node isidentified as a malicious node in 100 times.

1) IMPACT OF QUERY-ISSUING TIMES, NumQueryWe examine the effect of the query-issuing times, NumQuery.Fig. 7 shows the simulation results. In these graphs, thex-axis indicates NumQuery, and the y-axis indicates the num-ber of the identified malicious node in Fig. 7 (a) and therate of the misidentification in fig. 7 (b). In the graphs

FIGURE 7. Effect of NumQuery . (a) Number of identified malicious nodes.(b) Rate of the misidentification.

VOLUME 4, 2016 1005




of the number of identified malicious nodes, MN -5 indi-cates that there are five malicious nodes in the network andMN -5 + LN -5 indicates that there are five malicious nodesand five liar nodes in the network.

From Fig. 7 (a), as NumQuery increases, the numberof identified malicious nodes increases in both methodsbecause the number of nodes identified by local identificationincreases. In our proposed method, more malicious nodesare identified at less query-issuing times than in the majoritymethod. This shows the effectiveness of our method whichcombines majority-based and similarity-based approaches.By similarity-based grouping, our method can classify col-laborating malicious nodes into one group, which reduces theinfluence of FNAs. Meanwhile, even if the number of liarnodes, LN , increases, the number of identified nodes does notmuch change in bothmethods. This is because, in themajoritymethod, nodes conclusively identify malicious nodes simplyby the number of nodes identifying those node as maliciousbased on the threshold. However, as Fig. 7 (b) shows, thisleads to misidentification. On the other hand, in our proposedmethod, as mentioned above and also shown in Fig. 7 (b),FNAs have less influence even if the number of LN s increase.From Fig. 7 (b), asNumQuery increases, the rate of misiden-

tification decreases in both methods. In the majority method,the influence of FNAs deceases because of the increase ofinformation on correctly identified malicious nodes. On theother hand, in our proposed method, as NumQuery increases,the number of groups consisting of normal nodes increases,and thus, FNAs have lower influence. Fig. 7 (b) also showsthat LN s have a significant impact on the majority method,while little on our method. In the majority method, it isobvious that as the number of LN s increases, the number ofnormal nodes identified as malicious increases. On the otherhand, in our proposedmethod, as mentioned, similarity-basedgrouping reduces the influence of FNAs.

2) IMPACT OF THE NUMBER OF MALICIOUS NODESWe examine the effect of the number of malicious nodes, m.In this experiment, there is no LN in the network. Fig. 8 showsthe simulation result. In these graphs, the x-axis indicates thenumber of malicious nodes, m, and the y-axis indicates thenumber of the identified malicious nodes in Fig. 8 (a) andthe rate of misidentification in Fig. 8 (b).

FIGURE 8. Effect of m. (a) Number of identified malicious nodes. (b) Rateof the misidentification.

From Fig. 8 (a), as m increases, the difference in thenumber of identified malicious nodes between our methodand the majority method increases. This is because, in ourproposed method, as m increases, the number of maliciousnodes identified by normal nodes increases, and thus, thenumber of groups consisting of normal nodes increases,which helps to identify more malicious nodes.

FromFig. 8 (b), when the number ofmalicious nodes is 8 ormore in the majority method, and 10 or more in our method,the rate of misidentification increases. This is because, asm increases, there are more chances of FNAs. In the majoritymethod, as m increases, the number of identified maliciousnodes increases, and but misidentification also increases dueto static λ. On the other hand, as mentioned, FNAs have littleinfluence on our method even m increases.

VII. CONCLUSIONIn this paper, we have proposed methods for top-k queryprocessing and malicious node identification based on nodegrouping in MANETs. In order to maintain high accuracyof the query result and detect attacks, nodes reply withk data items with the highest score along multiple routes.After detecting attacks, the query-issuing node narrows downthe malicious node candidates and then tries to identifythe malicious nodes through message exchanges with othernodes.Whenmultiple malicious nodes are present, the query-issuing nodemay not be able to identify all malicious nodes ata single query. It is effective for node to share the informationabout the identified malicious nodes with other nodes. In ourmethod, each node divides all nodes into some groups byusing the similarity of the information about the identifiedmalicious nodes. Then, it identifies malicious nodes based onthe information on the groups.

In this paper, we did not address the issue of identificationof liar nodes (LN s). As part of our future work, we planto design a method to identify LN s, and also to design amessage authentication method to prevent malicious nodesfrom performing FNAs.

REFERENCES[1] D. Amagata, Y. Sasaki, T. Hara, and S. Nishio, ‘‘A robust routing method

for top-k queries in mobile ad hoc networks,’’ in Proc. MDM, Jun. 2013,pp. 251–256.

[2] W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden, ‘‘Progressivedistributed top-k retrieval in peer-to-peer networks,’’ in Proc. ICDE,Apr. 2005, pp. 174–185.

[3] S. Buchegger and J.-Y. Le Boudec, ‘‘Performance analysis of theCONFIDANT protocol,’’ in Proc. MobiHoc, 2002, pp. 226–236.

[4] T. Camp, J. Boleng, and V. Davies, ‘‘A survey of mobility models forad hoc network research,’’ Wireless Commun. Mobile Comput., vol. 2,no. 5, pp. 483–502, Sep. 2002.

[5] B. Chen, W. Liang, R. Zhou, and J. X. Yu, ‘‘Energy-efficient top-kquery processing in wireless sensor networks,’’ in Proc. CIKM, 2010,pp. 329–338.

[6] H. Chan, A. Perrig, and D. Song, ‘‘Secure hierarchicalin-network aggregation in sensor networks,’’ in Proc. CCS, 2006,pp. 278–287.

[7] S. Chen, Y. Zhang, Q. Liu, and J. Feng, ‘‘Dealing with dishonest recom-mendation: The trials in reputation management court,’’ Ad Hoc Netw.,vol. 10, no. 8, pp. 1603–1618, Nov. 2012.

1006 VOLUME 4, 2016




[8] P. Dewan and P. Dasgupta, ‘‘P2P reputation management using distributedidentities and decentralized recommendation chains,’’ IEEE Trans. Knowl.Data Eng., vol. 22, no. 7, pp. 1000–1013, Jul. 2010.

[9] N. C. Fernandes, M. D. D. Moreira, and O. C. M. B. Duarte, ‘‘A self-organized mechanism for thwarting malicious access in ad hoc networks,’’in Proc. INFOCOM, 2010, pp. 266–270.

[10] R. Hagihara, M. Shinohara, T. Hara, and S. Nishio, ‘‘A message processingmethod for top-k query for traffic reduction in ad hoc networks,’’ in Proc.MDM, May 2009, pp. 11–20.

[11] Y.-C. Hu, A. Perrig, and D. B. Johnson, ‘‘Ariadne: A secureon-demand routing protocol for ad hoc networks,’’ in Proc. MobiCom,2002, pp. 12–23.

[12] Y.-C. Hu, D. B. Johnson, and A. Perrig, ‘‘SEAD: Secure efficient distancevector routing for mobile wireless ad hoc networks,’’ Ad Hoc Netw., vol. 1,no. 1, pp. 175–192, Jul. 2003.

[13] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, ‘‘The Eigentrustalgorithm for reputation management in P2P networks,’’ in Proc. WWW,2003, pp. 640–651.

[14] K. Liu, J. Deng, P. K. Varshney, and K. Balakrishnan,‘‘An acknowledgment-based approach for the detection of routingmisbehavior in MANETs,’’ IEEE Trans. Mobile Comput., vol. 6, no. 5,pp. 536–550, May 2007.

[15] S. Kurosawa, H. Nakayama, N. Kato, A. Jamalipour, and Y. Nemoto,‘‘Detecting blackhole attack on AODV-based mobile ad hoc networks bydynamic learning method,’’ Int. J. Netw. Secur., vol. 5, no. 3, pp. 338–346,2007.

[16] S. J. Lee and M. Gerla, ‘‘Split multipath routing with maximally disjointpaths in ad hoc networks,’’ inProc. ICC, vol. 10. Jun. 2001, pp. 3201–3205.

[17] Z. Li and H. Shen, ‘‘A hierarchical account-aided reputation manage-ment system for large-scale MANETs,’’ in Proc. INFOCOM, Apr. 2011,pp. 909–917.

[18] X. Liu, J. Xu, and W. C. Lee, ‘‘A cross pruning framework for top-kdata collection in wireless sensor networks,’’ in Proc. MDM, May 2010,pp. 157–166.

[19] W. Lou, W. Liu, and Y. Fang, ‘‘SPREAD: Enhancing data confidentialityin mobile ad hoc networks,’’ in Proc. INFOCOM, vol. 4. Mar. 2004,pp. 2404–2413.

[20] B. Malhotra, M. A. Nascimento, and I. Nikolaidis, ‘‘Exact top-k queries inwireless sensor networks,’’ IEEE Trans. Knowl. Data Eng., vol. 23, no. 10,pp. 1513–1525, Oct. 2011.

[21] S.Marti, T. J. Giuli, K. Lai, andM. Baker, ‘‘Mitigating routingmisbehaviorin mobile ad hoc networks,’’ in Proc. MobiCom, 2000, pp. 255–265.

[22] P.Michiardi and R.Molva, ‘‘CORE: A collaborative reputationmechanismto enforce node cooperation in mobile ad hoc networks,’’ in Proc. Adv.Commun. Multimedia Secur., Sep. 2002, pp. 107–121.

[23] T. Tsuda, Y. Komai, Y. Sasaki, T. Hara, and S. Nishio, ‘‘Top-k queryprocessing and malicious node identification against data replacementattack in MANETs,’’ in Proc. MDM, Jul. 2014, pp. 279–288.

[24] M. Srivatsa, L. Xiong, and L. Liu, ‘‘TrustGuard: Countering vulnerabilitiesin reputation management for decentralized overlay networks,’’ in Proc.WWW, 2005, pp. 422–431.

[25] Y. Sasaki, R. Hagihara, T. Hara, M. Shinohara, and S. Nishio, ‘‘A top-kquery method by estimating score distribution in mobile ad hoc networks,’’in Proc. DMWPC, Apr. 2010, pp. 944–949.

[26] Y. Sasaki, T. Hara, and S. Nishio, ‘‘Two-phase top-k query processing inmobile ad hoc networks,’’ in Proc. NBiS, Sep. 2011, pp. 42–49.

[27] J. Shi, R. Zhang, and Y. Zhang, ‘‘Secure range queries in tiered sensornetworks,’’ in Proc. INFOCOM, Apr. 2009, pp. 945–953.

[28] A. D. Wood and J. A. Stankovic, ‘‘Denial of service in sensor networks,’’Computer, vol. 35, no. 10, pp. 54–62, Oct. 2002.

[29] M. Wu, J. Xu, X. Tang, and W. C. Lee, ‘‘Top-k monitoring in wire-less sensor networks,’’ IEEE Trans. Knowl. Data Eng., vol. 19, no. 7,pp. 962–976, Jul. 2007.

[30] Y. Yi, R. Li, F. Chen, A. X. Liu, and Y. Lin, ‘‘A digital watermarkingapproach to secure and precise range query processing in sensor networks,’’in Proc. INFOCOM, Apr. 2013, pp. 1950–1958.

[31] C. M. Yu, Y. T. Tsou, C. S. Lu, and S. Y. Kuo, ‘‘Practical and secure multi-dimensional query framework in tiered sensor networks,’’ IEEE Trans. Inf.Forensics Security, vol. 6, no. 2, pp. 241–255, Jun. 2011.

[32] C.-M. Yu, G.-K. Ni, I.-Y. Chen, E. Gelenbe, and S.-Y. Kuo, ‘‘Top-k queryresult completeness verification in tiered sensor networks,’’ IEEE Trans.Inf. Forensics Security, vol. 9, no. 1, pp. 109–124, Jan. 2014.

[33] R. Zhang, J. Shi, Y. Liu, and Y. Zhang, ‘‘Verifiable fine-grainedtop-k queries in tiered sensor networks,’’ in Proc. INFOCOM, Mar. 2010,pp. 1–9.

[34] Y. Zhang, G. Wang, Q. Hu, Z. Li, and J. Tian, ‘‘Design and performancestudy of a topology-hiding multipath routing protocol for mobile ad hocnetworks,’’ in Proc. INFOCOM, Mar. 2012, pp. 10–18.

TAKUJI TSUDA received the B.E. degree inmultimedia engineering and the M.E. degree ininformation science and technology from OsakaUniversity, Osaka, Japan, in 2013 and 2015,respectively. His research interests include dis-tributed databases, mobile networks, and mobilecomputing systems.

YUKA KOMAI received the B.E. degree inmultimedia engineering and the M.E. degree inInformation Science And Technology from OsakaUniversity, Osaka, Japan, in 2011 and 2013,respectively, where she is currently pursuingthe Ph.D. degree in information science andtechnology. Her research interests include dis-tributed databases, mobile networks, and mobilecomputing systems.

TAKAHIRO HARA (SM’98) received the B.E.,M.E., and D.Eng. degrees in information sys-tems engineering from Osaka University, Japan, in1995, 1997, and 2000, respectively. He is currentlya Full Professor with the Department of Mul-timedia Engineering, Osaka University. He hasauthored over 350 journal and conference papersin the areas of databases, mobile computing,peer-to-peer systems, WWW, and wireless net-working. His research interests include distributed

databases, peer-to-peer systems, mobile networks, and mobile computingsystems. He served as the General Chair of the IEEE SRDS 2014 andMobiquitous 2016, and the Program Chair of the IEEE MDM’06/10, theIEEE AINA’09/14, and the IEEE SRDS’12. He is an ACM DistinguishedScientist and a member of three other learned societies.

SHOJIRO NISHIO (F’12) received the B.E.,M.E., and Ph.D. degrees from Kyoto University,Japan, in 1975, 1977, and 1980, respectively. Hebecame a Full Professor with Osaka University, in1992, and was conferred the title of DistinguishedProfessor in 2013. Prior to assuming his positionas the President in 2015, he also served a num-ber of positions with Osaka University. He hasco-authored and co-edited over 55 books and morethan 650 refereed journal or conference papers.

His areas of expertise in database systems include concurrency control,knowledge discovery, deductive and object-oriented databases, multimediasystems, and database system architectures for advanced networks, such asbroadband networks and mobile computing environment. He is a memberof eight learned societies. He has served as a member of the Programor Organizing Committees for more than 100 international conferences,including VLDB, ACM SIGMOD, and the IEEE INFOCOM. He receivedthe Medal with Purple Ribbon from the Emperor of Japan in 2011, and theDistinguished Achievement and Contributions Award in information scienceand technology from the Minister of Education, Culture, Sports, Science andTechnology, Japan, in 2014.

VOLUME 4, 2016 1007