Report_Summer

Summer Research Internship 2014

Privacy Preserving and Integrity Protecting Data Aggregation

in Wireless Sensor Networks

Rutvij Shah (ID-201101042)

Guide: Prof. Manik Lal Das

Abstract:

Wireless sensor networks (WSNs) are typically comprised of a large number of

sensors being randomly deployed for detecting and monitoring tasks. The data

aggregation is the most important task in wireless sensor Networks (WSNs), by

avoiding the redundant data transmitting to base station. The deployment of

wireless communicating sensor nodes in the hostile or unattended environment

causes attack more easily and the resource limited characteristics make the

conventional security algorithms infeasible. So these WSN’s always deal with

the problem of privacy preservation and integrity protection in Data

Aggregation.

Privacy preservation of data is that the node should not reveal the raw data to

any other node , the aggregator in the network or the nodes outside the network.

Maintaining the privacy of data makes sure that data is accessed by its intended

users, thus ensuring the privacy and protection of personal data.

Integrity protection of data guarantees that while aggregation of data in any

Sensor Network the data is not being tempered that means the data being

transmitted and received is similar. Data might be changed by a hacker or it

may be corrupted accidentally. Importance of data is only significant when data

is reliable or else it is of no use. So, Validation and verification are used to

ensure the integrity of data.

Study: We studied and improved the following algorithm in aspects of its integrity protection of data.

Consider a Wireless Sensor Network in which there are number of sources that

collect and produce data. In these sources there are two or more parties which

contains private data and that should be aggregated to third party without

revealing the content of data. The data collected by these sources are private

and they do not want to reveal this data therefore the data is to be aggregated by

an aggregator, (which may be the third party). This data should be hide from

aggregator because the sources do not trust these aggregators and the data

needs to be secured and privacy protected.

In this scheme, privacy is preserved through randomization process. The security part is being done with random key pre-distribution method.

So, The proposed scheme has two parts:

1. Secure key management.

2. Privacy preservation.

Consider the following network scenario :

Data Data Data

Source 1 Source 2 Source 3

When the service provider or the server sends the query to these sources the

sources collects the data and send back this data to the server.These sources

never wants their data to be revealed so they never sends the data in raw form

but to send this data they use some perturbation technique on the raw data. So

that the server can’t identify the data and just aggregate the data received from

these n sources.

Secure Key Distribution and Key establishment Phase :

K number of keys stored in every source node.

K-k keys ->shared with the server/aggregator for source to aggregator secure

communication

k number of keys-> source to source communication.

The secure key distribution method has two parts :

1. Aggregator to source key exchange

2. Source to source key exchange

Aggregator

Source 1

1

Source 2 Source 3

Aggregator to source key establishment:

Each source node has K-k number of keys shared with the server. But there is a

problem because all the source nodes possess the same keys , it is totally

unsecure when a source node communicates with the server node with the

shared key. Any malicious source node can know the communication between

server and source nodes and can launch attack very easily.

Solution:

To avoid this, in the pre-distribution phase, the permutation of source-

aggregator key bank is done and these keys are reordered for each source-

aggregator pair. This ordering for each node is stored in the server. Now, the

source node communicates with the server through one of its shared keys. For

communication and data aggregation the source node first generates a random

number between 1 and K-k (Rn) and send this random number to the server in

plain text form. The server understands that the source node will encrypt the

next message by the Rnth number key of the key bank. Every time the source

node likes to communicate with the server and send the data it does the same

steps.

How to communicate this random number to the server?

This random number (Rn) is sent to the server in plain text form. And if any

malicious node send the different random number this will not harm because

of the randomization of the key bank order. The Rnth number key is different

in different source nodes. The mapping is stored in the server offline in pre-

distribution phase. This is the key establishment phase.

Source to Source key establishment :

Assumption is that the source to aggregator/server key is securely established. So, source to source key establishment is done through server only not directly between the sources.

Another Problem :

In this scenario k keys are same for every node therefore it is easy for any

source node to get the data of other source nodes , i.e. source 3 can decrypt

what source node 1 and source node 2 are communicating.

The solution to avoid this situation is also same as the solution that we

discussed in source to aggregator issue. Source node 1 and source node 2

separately permute the key bank order of the k number of keys dedicated for

source-source communication. After that, they pass the permute function to

each node through the server using their pair-wise key with the server. After

successful delivery of permute functions, one of the source nodes (source node

1, for example) sends another random number between 1 and k to the other

source node (source node 2), which indicates the particular key of the

permuted key bank. This pair-wise key between source nodes will be used for

the subsequent communication until the data aggregation is complete. For next

round of data aggregation process, same key establishment procedure will be

followed.

Privacy Preservation: In this Wireless Sensor Network, there are n numbers

of source nodes. Each source i owns a value xi which is we can say a raw data ,

which is not being shared to other parties or nodes. Suppose that the sum is in

the range [0, K]. Our aim is to find out the sum X which is the sum of the raw

data xi, where i=1,2, … , n and the individual data and sum should be private

and secured between the nodes as well as to the server.

The server will start the process. The server randomly chooses one of the

source nodes and signals it to initiate the process. The source node first chosen

by the server is denoted by s1. This node possesses its private data x1 and it

generates one random number r1 between the range [0, K], which is denoted as

r1. It then computes R1.

R1 = (r1 + x1)modK

After computing R1, the source node s1 performs neighbourhood discovery to find out the other source nodes it is connected to. s1 passes this information to the server.

Server keeps the knowledge of the nodes already participated. If the source

nodes connected to s1 is not already participated, the server randomly chooses

one of those non-participated source nodes and sends that message to s1. Let

this next source node be s2. Now, accordingly s1 passes R1 to s2.

Now s2 computes the following : R2 = (R1+x2)modK.

The source node follows the same procedure as s1 and sends R2 to s3.

This way sn is reached, which computes Rn.

Rn = (Rn-1 + xn-1)modK.

The server comes to know that all the nodes have been participated, it asks

the last node to send RN to it. Server now tells the first source node s1 to

compute the summation as X = (Rn-r1)modK = sum of all xi .which is the

summation of individual node’s data and then server will send this data

ahead or keep it for itself.

.

Security Analysis:

The assumption is that the data which is being shared by nodes is correct. If

there is no collusion, the node which has only shared its data and know its raw

data can only calculate the sum of all other nodes, i.e (x-xi)modn. However, if

two or more source nodes collude, they can disclose more information. For

example, if the two neighbours of node i (that is, parties i − 1 and i + 1) collude,

they can learn xi = (Ri − Ri-1)mod n.

To avoid this we ask server or aggregator to choose the next node and server does that by choosing from the eligible neighbours.

Another Interesting Case:

There is also a possibility of colluding through bypassing the server. In that

case, the source node sends the computed Ri to its colluding node then the

scheme needs to be slightly modified. In that scenario, source to source

communication needs to be strictly via server. But in that case the

communication overhead will increase.

R1=r1+x1 R2=R1+x2 Rn=Rn-1+xn-1

A Vulnerability: Server Trust Issues

Server may attempt to get the knowledge of source’s private data. This can be done in the following way:

1) s1 passes the information of its neighbour to the server for forwarding R1

to other sources. There is a possibility that the server declares that the

neighbours of node 1 has already participated and therefore no nodes are

left and then the server ask the node s1 to send its data in Computation of

SUM and the data that s1 sends is the data that s1 contains. In this way

the server can identify the private data of each and every node by

subsequently choosing them as both initiator and the terminator node. . In

order to avoid that, each time the initiator source node asks the server

node to send the computation SUM value , it compares it with its private

value, if both happen to be same; initiator source node sends the message

to the server that “operation cannot be performed”.

How to preserve integrity along with privacy in this protocol?

We have analysed and tried to implement two techniques:

1) Integrity protection through perturbation technique.

2) Integrity protection through shuffling and aggregation tree.

1) Integrity protection through perturbation technique:

First the server or the base station sends the query to all the source nodes. After receiving a query from the BS, each sensor node customizes its data into a complex number by combining sensitive data with a private real number and adjoins an imaginary number to it. It uses an additive property of complex numbers to check integrity in data aggregation and achieve privacy from other trusted participating node as well as from adversaries. In the two parts of a complex number, the real part is used for privacy preservation and imaginary part is used for integrity checking. Every node share two keys, one key is shared with master device and other is shared with those sensor nodes lying on the aggregation tree. Thisprotocol requires a considerable memory at each node.

Each node encrypts and sends the customized data to its parent node using the shared key between them.

R1 = {[(r1+x1) + ci]}modK

This data is sent to other nodes as we have discussed in privacy preservation

part. The customized data is aggregated (i.e. sum) by using additive

properties of complex number and send to the sink after the encryption.

After the entire process of privacy preservation, the server/aggregator has

the aggregated value with itself. Now, in order to get the actual data and to

check the integrity, first separate the real part and imaginary part of the sum.

When the base station get the sum of whole data it separates the real part by

subtracting the real seeds data from it and also subtract the imaginary part.

Then the sink will compare the sum of imaginary part that it received and

also the sum of individual data’s imaginary part. If both do not have much

difference that means the data is not being tempered and the privacy is

preserved.

Problem with this scheme: participation of partial nodes

If only some of the nodes participate in data aggregation, then the above

scheme for checking data integrity fails because base station would not

know which nodes are participating or not participating and thus instead of

subtracting only the imaginary values of nodes which are participating, it

subtracts the entire sum of imaginary values of all nodes.

Solution: We devised a solution to this problem of partial participation of nodes by introducing a bit vector as follows:

0 0 0 0 0

In this sensor network the length of this bit vector is same as the number

nodes a single path can consist. Initially, all the bits would be set to zero and

this zero corresponds to the absence of a node in process of data aggregation.

As this vector is being passed through nodes of a specific path in the

network, the nodes which are participating will flip their corresponding bit

from 0 to 1 indicating its presence. Here, assumption is that each node

knows its bit position. Thus, finally, the base station would receive a bit

vector which would indicate that which nodes have participated and thus it

can subtract the corresponding imaginary values. As s sensor network can

consist 32 nodes in a single path, the bit vector’s length is maximum 32 bits

therefore the communication cost do not create a problem.

Security Problem: There is one major problem with this solution. It is that

any malicious node can flip any node’s bit and thus convey incorrect

information to the base station. This problem can be solved by using the

following scheme:

Each node will update its bit and encrypt its position vector with its own key For example, the position vector for first node in path of three nodes is

0 0 1

The node sends the updated bit vector and its own position vector in encrypted form to the next node in the path. The next node will update its own bit and encrypt its position vector with its own key and XOR this with the

Encrypted position vector received from previous node. This pattern would be

repeated till the entire bit vector and the XOR of the encrypted position vector

of all nodes along the path reaches the base station. Finally, the base station

looks up to the received bit vector and computes the XOR of the nodes which

are present as it knows the key of each node and then compares it with

received XOR value. If this values match, then it is assumed that data is not

tempered with and data is accepted else it is rejected.

Another interesting study: Creation of Energy Efficient tree in Wireless Sensor Networks

Most of the Wireless Sensor Networks uses the spanning trees to efficiently

aggregate the data. When a data is sensed by sensor nodes, relevant data

must be forwarded to sink. The sink is the root of spanning tree and all the

other nodes that sense event, construct the tree. Each intermediate node

aggregates this data with data sent by its child and then transmits this

aggregated data to its parent. This procedure continues until data arrives to

the sink.

Parameters related to protocol:

1) To compute the aggregation tree, in the intermediate nodes the consumed energy for sending data from the leaves to the sink must be considered.

2) The tree’s delay which is equal to the tree’s depth should be considered .

3) Scheduling mechanism and queuing delay can be considered as

B.S

1

2

3

{001} c1= {001}k1

{011}, c2={010}k2, c=c1 XOR c2

{111},c3={100}k3, c = c1 XOR c2 XOR c3

Compute c1, c2, c3 by seeing bit

vector and compute c by XORing

and compare.

aggregation tree evaluation parameters. 4) To decrease number of failed nodes and to increase the network

lifetime, both remaining energy and distance parameters are considered.

The first parameter that should be considered is the remaining

energy in each node.

The distance between the nodes is considered as the second

parameter i.e. each node selects a node with most energy within its

neighbours as parent. If there are some neighbours with equal

energy, a neighbour with least distance will be selected. By using

this strategy, a node with low remaining energy can be alive more.

This increases the lifetime of the network and supports better

coverage.

To provide fairness in energy consumption, in addition to the

residual energy and distance, third parameter which is the

maximum number of children permitted is also considered. In the

proposed algorithm, each node could have a predetermined

maximum number of children.

To avoid high power consumption because as the transmission power is

proportional to the distance, the proposed algorithm uses the average path’s

energy as a new parameter. This parameter is calculated as the sum of residual

energy of each node among the path divided by the path length. A node with

highest energy is chosen as a parent node in any path.

If residual energy = 5J

distance between nodes = 2m

parameter =5/2 = 2.5 J/m

Thus in this way, path is chosen and tree is formed.

For example:

The remaining energy of nodes 1, 2, 3, 4, 5, 6, 7 and 8 are equal to 10J, 2J, 8J, 3J, 6J, 8J, 7J and 9J, respectively. Suppose that node 8 wants to select its parent .Node 5 which has more average path’s energy is selected as the parent of node 8.

After performing the same procedure for each node, the spanning tree created as a result looks like:

This concludes our research and development during summer internship.

Thank You.

References:

1) Privacy Preserving Data Aggregation in Wireless Sensor Networks Arijit Ukil Innovation Labs, Tata Consultancy Services, Kolkata, India

IEEE ICWCMC 2010, Valencia, Spain

2) Integrity Protecting and Privacy Preserving Data Aggregation Protocols in Wireless Sensor Networks: A Survey

Joyce Jose

Post Graduate Scholar, Dept. Information Technology, Karunya University, Coimbatore, India [email protected]

M. Princy

Lecturer, Dept. Information Technology, Karunya University, Coimbatore, India [email protected]

Josna Jose

Post Graduate Scholar, Dept. Information Technology, Karunya University, Coimbatore, India [email protected]

3) Energy Efficient Spanning Tree for Data Aggregation in Wieless Sensor Networks

Zahra Eskandari Department of Computer Engineering, Ferdowsi University of Mashhad e-mail: [email protected]

Mohammad Hossien Yaghmaee Department of Computer Engineering, Ferdowsi University of Mashhad Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506 e-mail: [email protected]

mailto:[email protected]







AmirHossien Mohajerzadeh Department of Computer Engineering, Ferdowsi University of Mashhad e-mail: [email protected] 4) AN ENERGY-AWARE SPANNING TREE AlGORITHM FOR DATA AGGREGATION IN WIRELESS SENSOR NETWORKS Marc Lee and Vincent W.S. Wong Department of Electrical and Computer Engineering The University of British Columbia,Vancouver, BC,

Canada e-mail: {wnmlee, vincentw}@ece.ubc.ca


Documents

Report_Summer