(3) a P2P Collaborative RFID Data Cleaning Model

8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model

http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 1/6

A P2P Collaborative RFID Data Cleaning Model

Xiaogang Peng, Zhen JiSoftware College of Shenzhen University,

Shenzhen, P.R.China, 518060.

Zongwei Luo, Edward C. Wong, C. J. Tan E-Business Technology Institute G01-G05

Technology Innovation and Incubation Building The University of Hong Kong,

Pokfulam Road, Hong Kong

[email protected]

Abstract

RFID emerges to be one of the key technologies to

modernize Logistics and supply chain management. Ina typical RFID enabled logistics and supply chainapplication, there exist RFID readers to detect and

identify the goods with RFID tags attached.

Considering the huge amount of goods, the successful

reading of RFID data becomes a crucial issue. Manyalgorithms and models to improve the RFID reading

have been proposed, yet most of them focus on

addressing the problem in a single reading node. In

this paper, we introduce a P2P model to identify and

remove inaccurate reading of RFID data by utilizing

the information exchanged among related nodes along

the business processing route of each RFID tagged

item. The successful deployment of this model will easethe demand for high accurate reading of each RFID

reading node while reducing the total cost of the RFIDnetwork. Our simulation shows the RFID network

correct function against the business requirements will

be ensured while the overall performance of the RFID

network can be guaranteed.

1. Introduction

Radio Frequency Identification (RFID) is atechnology that allows readers to detect a tagged item

without line of sight or contact by using radio-

frequency waves. Because of the advantage of thereading flexibility, RFID is adopted in a wide range of

applications [1,2,3,4]. In a typical RFID enabled

Logistics and supply chain application, RFID readers

are employed to detect the RFID tagged items passing

by each detection node. Usually, a supply chain system

consists of a number of detection nodes. The reading

data of each detection node will be integrated and

analyzed to generated logics for higher level

processing such as RFID management and event processing[5], therefore a little decrease in the

successful RFID data read rate of each node will add

up to a noticeable drop in the overall system

performance in RFID event generating. Furthermore,the amount of RFID tagged items passing by these

nodes is always huge [6], a slight decreasing of the

overall successful read rate will cause a big number of

misreads on tagged items, which will reduce theaccuracy of further data process such as tracing or

event generating . Maintaining high successful RFID

read rates of the supply chain systems becomes one of

the most crucial issues in the RFID research area.

Unfortunately, according to related research as seenin [7,8], the observed accurate read rate of RFID data

in the real-world project is about 60-70%, which is far

from satisfying. In order to improve the successful readrates of RFID, many data cleaning algorithms and

mechanisms have been proposed as seen in[9,10,11,12]. Most of the proposed data cleaning

methods focus on the problems within a single node

but ignore that the tag readings from related detection

nodes will also provide useful information to assist the

data cleaning processes. In this paper, we consider the

detection nodes as a chain according the movement ofa RFID tagged item in a supply chain and introduce a

P2P model to identify and remove inaccurate reading

of RFID data by utilizing the information of connectednodes along the processing route of each RFID tagged

item. The successful deployment of this model will

also ease the demand for high accurate reading of eachRFID reading node while reducing the total cost of the

The 3rd International Conference on Grid and Pervasive Computing - Workshops

978-0-7695-3177-9/08 $25.00 © 2008 IEEE

DOI 10.1109/GPC.WORKSHOPS.2008.12

304



RFID network. Our simulation shows the RFID

network correct function against the business

requirements will be ensured while the overall

performance of the RFID network can be guaranteed.

2. Related Research

Comparing to the current bar code, RFID excels in

the flexibility that no line of sight and contacts areneeded, therefore has been deployed in many different

area [2, 3, 4, 5]. For example, supply chain

management will benefit from RFID technology in

saving the labor cost on scanning tags, facilitating

inventory replenishment decision and tracing products

in supply chain [18]. Most of the existing RFID

enabled systems share the same three-level architectureas shown in figure 1.

Figure 1. RFID System Architecture

At the first layer, the detection nodes are deployed

to read the RFID tags in different locations according

to business requirements. In most systems, multiple

readers are used in each detection node to reduce the

false negative readings. Because of the comparativelylow correct read rate, usually 60%-70%, of the raw

RFID data captured by the readers, the cleaning or

“smoothing” [8,9] step is needed. Some of the data

cleaning mechanisms apply to readers [13, 15, 16] and

some requires a centralized back-end system [17, 18],

or the data processing server at layer two in figure 1, tohandle the raw data. In the server, other manipulations

like compression, simple rule association and eventgeneration [19] can also be found before entering the

enterprise application or database, which are at thethird layer.

One of the biggest challenges of the RFID data is

the data volume. As a real world example, seven

terabyte data generated by the Wal-Mart RFID trial

every day is reported [20]. Sending all these data in to

a centralized system for data cleaning requires a high

performance server as well as a high speed network,

which will inevitably increase the total hardware cost.One solution is to distribute the centralized information

into local nodes, then analyze the information in a

distributed way. In this paper, we propose a P2P modelto further reduce the communication overhead by

introducing a data cleaning relationship establishment

towards building data cleaning clusters, which will be

discussed in Section 3.

Different from the client-server network structurewith centralized resources bundled to one or several

servers, every node within the P2P network equally

participates in services or applications by sending or

receiving messages between other nodes through

diverse channels within the network [21]. Through this

structure, the processing power of each node and

bandwidths of connection can be better utilized andintegrated to accomplish some real time, heavily-

loaded computation tasks. By modeling the RFID datacleaning task with the P2P network concept, we designa novel P2P model over the RFID detection node

network and develop an algorithm to identify and

remove error data generated by readers in the nodes.

Consider the scenario that tagged items pass by a

RFID detection node, there are three types of errors

besides successful read: false negative, false positiveand redundant reading as stated in Section 1. The

redundant case can be removed by using a time stamps

to mark the entering and exiting of the tagged item.

Then by using the numbers of successful read, false

negative and false positive, the performance of the

node can be evaluated following the measures used ininformation retrieval [22]: let TP, which is the short

form of “true positive”, denotes the number ofsuccessful read cases, FN and FP represents the

numbers of false negative and false positive cases

respectively, the precision rate P of the node will be

P=TP/(TP+FP), and the recall rate R will be

R=TP/(TP+FN). In this paper, R is used to measure the

capability of the successful reading from the original

tags in each detection node while P is a measure to

evaluate the performance of each node in avoidingfalse positive cases. Another measure F1, formulated

as F1=2PR/(P+R), is used to assess the overall

performance of the node by combining precision andrecall. In an ideal zero-error case, these three valueswill be 100%.

Detection Node

Data Processing Server

Detection Node Detection Node

EnterpriseApplication

Software

Database orData

Warehouse

305



3. The P2P Collaborative RFID Data

Cleaning Model

In many RFID enabled logistics or supply chain

management applications, RFID detection nodes are

deployed wherever the information of the tagged items

needs to be captured. The nodes within a supply chainsystem physically form a complicated network

connected with the channels. This scenario can be

abstracted as a complicated graph of detection nodes

with edges representing the possible movements of

tagged items. When the tags of the items are

recognized by a certain detection node, the informationwill be stored locally. Then the information will besent to a centralized server for further datamanagement process in normal case. In the P2P

approach, we define a RFID data exchange network

(RDEN) from the physical detection node network

layer by taking the detection nodes as vertexes and the

information exchanging between nodes as edges.The RFID data exchange network (RDEN) is

modeled as a undirected graph G(V,E). The vertex setV is defined as the RFID detection node in a supply

chain management system denoted by V(v1,v2,v3…).

There is an edge e∈E connecting detection node v1

with detection node v2 if there is informationtransferring between v1 to v2. The data management

process is done by sending and receiving short

messages between vertexes in this P2P network.For data cleaning purpose, a small number of vert

For data cleaning purpose, a small number of vertexes

in the network will be involved to form a data cleaning

cluster (DCC). Nodes in the cluster are related by the

business processing logic (BPL) towards solving RFIDdata cleaning problems. The method we used to build

the DCC from RDEN is similar to the building of

interconnected RFID reader collision model upon

reader collision network mentioned in [13], which

forms the network by exchanging the neighboring

information to update the routing neighbor table in

each node.The fact that each tagged item can only travel

through one path in time order in the supply chainmakes it possible to find a directed data cleaning path

(DDCP) within the DCC. Assumption is made that the

path can be determined before the each tagged item

enters the system. The assumption is self validated insome scenarios, such as items moving through an

assembly line or goods transporting from a

manufacturer to distribute center then a certain retailstore. One of the solutions for some other sophisticated

cases will be the “divide and conquer” methods[23],

which segments the DCC into simpler sub-networks

and apply some data aggregation based on the business

processing logic or path finding algorithms in graphs.The DDCP is therefore defined as a set of ordered

vertexes <v1, v2… vi…vn> , where n is the total

number of nodes and vi is the i-th nodes that each

tagged item passes. Consider the case of three nodesvi-1, vi and vi+1, vi-1 is the ‘PREVIOUS’ node of vi

while vi+1 is the ‘NEXT’ node of vi. When the DDCP

is known, the relation of the detection nodes along the

DDCP can be utilized in RFID data cleaning process.As a tagged item passes a node, the occurrence of the

item in this node will be recorded if it is read

successfully, and this information will assist in

detecting RFID reading errors for related nodes along

the path. For example, if an item is not detected in a

node yet reported in the previous node and the next

node, a false negative case will be announced with a

high probability. By exchanging the RFID reading

information between the nodes, the false positive andfalse negative cases in a node will be detected anderror correction will be applied properly. As a simple

example, consider the case of a DCC with 4 nodes in

the left hand side of Figure 2, tagged items are passing

through node a0 to either a1’ or a1’’ then goes to node

a2. By information exchange between nodes, items are

found both in a2 and a0, then a1’ and a1’’ will beregarded as one node logically after applying BPL and

the data of the two nodes will be integrated to form a

DDCP as seen in the right hand side of Figure 2.

Consider the scenario that some tagged items moving

from stating point a0 towards ending point an, in a

DDCP found in a DCC by using business logic, thetags are detected with a certain successful read rate r i

by each detection nodes ai when it passes by. In eachnode ai, besides the table with records of RFID in the

classic form of Ti<TagID, LOCATION,

TIMESTAMP>[14]. For collaborative data cleaning

purpose we introduced another table with the following

structure: TCi<TagID, PRE, CURRENT, NEXT,

STATUS >. In each row of the table, TagID records

the tagged RFID of the item. Binary values in PRE,

CURRENT, NEXT represent the detection of the itemin previous, current and next node accordingly with ‘1’

for detected and ‘0’ otherwise. The STATUS column is

used to indicate the data cleaning result of this item.

306



Figure 2, DDCP finding in a DCC by BPL

The P2P collaborated data cleaning process can be

divided into three phases: initialization phase, localcorrection phase and peer correction phase. In the

initialization phase, when one of the tagged items is

detected, a record will be inserted into the TCi with the

RFID captured in the TagID column, at the same time,

the value in CURRENT column will be assigned to “1”

indicating that the item is recognized in this node.

Then the information of the detected node will be sent

to the previous and next nodes to update the values inthe corresponding columns of the rows with the same

RFID in table TCi-1 and TCi+1.

Table 1. Local correction phase of node ai.

Pre

Cur Next STATUS Actions taken

0 0 1 Pause Send -FP to nodea

i+1

0 1 0 Pause None0 1 1 Pause Send +FN to node

ai-1

1 0 0 Pause Send +FP to nodea

i-1

1 0 1 C1 ChangeCURRENT to 1,then send +C1 tonode a

i-1and –C1

to node ai+1

1 1 0 Pause NONE

1 1 1 OK NONEThe local correction phase follows the initialization

phase. In node ai, the program starts to check the

values in PRE, CURRENT and NEXT columns of TCi.

Based on the combination of these three values, node ai will update the values in STATUS column and

corresponding messages will be sent to ai+1 and ai-1. As

the only correction decision can be made in this phase,

the pattern “1,0,1”, which represent the false negative

case with high probability in ai, will triggers the actionof assigning 1 to current status to signify the false

negative case in this node and the STATUS is set to C1.

The updating of STATUS column and actions taken ondifferent pattern are summarized into table 1.

Figure 3. State transition diagram in thepeer correction phase.

In peer correction phase, the correction decisions

will be made by considering the “PRE, CURRENT,

NEXT” patterns as well as the messages received fromthe previous node and next node. The detail operations

for each different pattern of the nodes can be found in

the state transition diagram in figure 3. In the diagram,

the states are decided by the “PRE, CURRENT, NEXT” patterns. The transmissions of states aretriggered by the messaged received, which is denoted

as R (‘message’) on the out-going arrows of the

diagram. There are tour kinds of messages: FN (theitem with the RFID tag is predicted to be false negative

by the sender), FP (the item with the RFID tag is

predicted to be false negative by the sender), C1 (the

‘CURRENT’ value of the sender with the same RFID

has been changed to 1) and C0 (the ‘CURRENT’ value

of the sender with the same RFID has been changed to

0). The ‘+’ sign before a message shows that the

message is sent by the next node while ‘-’ sign

indicates that the message is from the previous node.Another function, denoted as C(value1, value2) in the

diagram, represents the ‘CURRENT’ value of thecurrent node will be changed from value 1 to value 2

and the STATUS column is changed to ‘C0’ or ‘C1’

according to value2. The function S(‘message’) means

a0

a1’

a2

a1’’

a0 a1 a2

Data integration by BPL

DDCP(a0,a1,a2)

100

000OK

001

111OK

110101011

010

R(+C0)

R(+fn), C(0,1)S(-C1), S(+C1)

R(-C1)

C(0,1)S(-C1), S(+C1)

R(+C1)

R(-fp) or R(+fp), C(1,0),

S(-C0), S(+C0)

R(+C1)

R(-C0)

R(+C1)

307



the message is sent to previous node (‘+’ sign) or next

node (‘-’ sign). A series of actions will take place in

state transition, all of which are indicated on the arrows

in figure 3.

4. Simulation and Results

In order to evaluate the performance of our

proposed model, a simulation software system isdeveloped. In the simulation system, the number of

detection nodes n, the successful read rate r i for each

node ai , as well as the number of RFID tags m are

parameters that need to be decided before the

simulation. Then the system generates a table of m

RFID codes in the form To<TagID, LOCATION,TIMESTAMP> as the original data to represent the

real world tagged items. The readings of the taggeditems over each of the detection nodes are simulated as

a table Ti . The RFID data in Ti is different from To

because of the preset successful read rate r i in each

node. For simplicity, we evenly set the successful

reading rate of each node equal to a certain value r .

The collaborative data cleaning process starts by

initializing the Tci table in each node ai as explained in

previous section. The sending and receiving of datawill change the Tci table and the updated information

will eventually change the Ti table as described in

previous section.

We randomly generate 1000 RFID tags to test the proposed model in DDCPs with 3, 5 and 7 nodes

correspondingly. For each run, the same DDCP is

evaluated under four different successful reading rates:

60%, 70%, 80% and 90%. The values of Precision

rates, Recall rates and F1 measures of each node underdifferent experiment settings are calculated. In order to

give a overview of the performance of the whole model

in each testing case, after applying the collaborative

data cleaning algorithm, the average Precision rates,

Recall rates and F1 measures of different testing cases

are calculated and shown in table 2.

Based on the results, the proposed collaborativedata cleaning model improves the performance of each

node under all testing cases. The highest improvementoccurs in the nodes with the successful read rate of

60%, which achieves an improvement of 29% in

average precision, 17% in average recall and 23% in

F1 measure. This finding shows that the proposedcollaborative model is capable in removing errors in a

low successful read rate and therefore will ease thehigh demand of accuracy on each detection node

leading to a lower hardware cost of the system.

Another interesting phenomenon is that when the

number of nodes increases in a system, by applying the

collaborative model, the recall rate is slightly improved

under the same preset successful reading rate. This

result contradicts to the belief that more nodes in aRFID system will cause more chaos in data cleaning

and therefore this model is ideal for real world

implementation because of this characteristic. Theincrease rage of 5%-8% in the measures of the test

under a high success read rate 90% also suggest that

this model even fits in some systems with data cleaning

mechanism enabled for further improvement.

Table 2. Average performance on cleaned data

r n P_avg R_avg F1_avg

60 3 89.71% 76.30% 82.14%

60 5 89.30% 78.06% 83.04%

60 7 89.73% 78.21% 83.39%

70 3 92.88% 82.58% 87.27%

70 5 92.66% 84.92% 88.49%

70 7 92.79% 85.10% 88.69%

80 3 95.74% 88.40% 91.82%

80 5 95.23% 89.70% 92.32%

80 7 96.46% 90.12% 93.15%

90 3 98.16% 94.61% 96.33%

90 5 97.91% 95.08% 96.45%

90 7 98.40% 94.83% 96.57%

5. Conclusion

In this paper, we develop a novel P2P RFID data

cleaning model from the physical layer of the detectionnode network in real world applications. By

collaboratively sending and receiving messages

between related nodes, the model is capable of

detecting and removing false positive and falsenegative cases automatically to meet the data cleaning

goal. Compared to most existing data cleaning

mechanism, the P2P model excels in avoiding the

calculation overhead in the centralized server and the

huge amount data transfer problem to reduce the total

network cost.

In our simulation, DDCPs with different numbers ofdetection nodes at different preset successful read rates

each time are tested and we achieve improvements inall testing cases. Especially, in low accurate reading

rate settings, the improvements in recall rate are more

than 15%. These satisfying results provide a way to

ease demand for high accurate reading of each RFID

reading node while reducing the total cost of the RFID

308



network. We also find that when the number of nodes

in a DDCP grows, the recall rate of each node in our

simulation actually goes up. This interesting finding

shows the robust scalability of the proposed model andtherefore will be ideal for real world implementation.

References[1] Siemens to Pilot RFID Bracelets for Health Care.:http://www.infoworld.com/article/04/07/23/HNrfid implants1.html (2004).[2] World’s Third Largest Retailer Completes WarehouseRFID Implementation.:http://www.informationweek.com/story/showArticle.jhtml?ar ticleID=57702741 (2005).[3] Tesco Pushes on with Full-scale RFID Rollout.:http://www.computing.co.uk/news/1160636 (2005).[4] R. B. Ferguson, “Logan Airport to Demonstrate Baggage,Passenger RFID Tracking”, eWeek, 2006.

[5] S.Chawathe, V.Krishnamurthy, S.Ramachandran, andS.Sarma, “Managing RFID data”, Proceedings of the 30th

VLDB Conference, 2004, pp. 1189-1195.[6] B. S. Prabhu, Xiaoyong Su, Harish Ramamurthy, Chi-Cheng Chu, Rajit Gadh, “WinRFID – A Middleware for theenablement of Radio Frequency Identification (RFID) basedApplications”, Invited chapter in Mobile , Wireless andSensor Networks: Technology, Applications and FutureDirections, Rajeev Shorey, Chan Mun Choon, Ooi WeiTsang, A. Ananda (eds.), John Wiley, 2005.[7] C. Floerkemeier and M. Lampe, “Issues with RFID usage

in ubiquitous computing applications” Pervasive Computing:Second International Conference, PERVASIVE, 2004.[8] S. Jeffery, M. Garofalakis and M. Franklin, “Adaptivecleaning for RFID data streams”, Proceedings of the 32ndinternational conference on Very large data bases(VLDB),2006, pp. 63–174.

[9] S. Jeffery, G. Alonso, M. Franklin, W. Hong, and J.Widom, “Declarative support for sensor data cleaning”,Pervasive, 2006.[10] J. Waldrop, D. W. Engels and S. E. Sanna, “Colorwave.a MAC for RFID reader networks”, In IEEE WirelessCommunications and Networking Conference (WCNC), New

Orleans, Louisiana, USA, 2003, pp. 1701–1704.[11] Daniel W. Engels, “The reader collision problem,AUTO-ID Center Whitepaper”,http://autoid.mit.edu/whitepapers/MIT-AUTOID-WH-007.PDF, 2002.[12] Engels D.W., Sarma S. E, “The reader collision problem”, IEEE International Conference on Systems, Manand Cybernetics, Hammamet, Tunisia, 2002.[13] Shijie Zhou, Zongwei Luo, Edward Wong, CJ Tan,“Interconnected RFID Reader Collision Model and itsApplication in Reader Anti-collision”, IEEE RFID 2007,

Texas, USA, 2007.[14] Shan R. Orlowska M., Li X., “RFID Data Management:Challenges and Opportunities”, IEEE First International

Conference on RFID, Grapevine, Texas, USA, 26-28, 2007, pp. 175-182.[15]Yuan, Y.; Yang, Z.; He, Z.; He, J. Taxonomy and surveyof RFID anti-collision protocols, Computer Communications,29 (2006),pp.2150–2166[16] Su-Ryun Lee, Sung-Don Joo Chae-Woo Lee, An

enhanced dynamic framed slotted ALOHA algorithm for

RFID tag identification. The Second Annual InternationalConference on Mobile and Ubiquitous Systems: Networkingand Services(MobiQuitous 2005). San Diego, CA, USA,2005, pp. 166-174[17] M. J. Franklin, et al.. Design Considerations for HighFan-In Systems: The HiFi Approach. In CIDR. 2005.[18] Laurie Sullivan. RFID Implementation ChallengesPersist, All This Time Later. Information Week, Oct 2005.[19] Richard Cocci, Yanlei Diao, and Prashant Shenoy.SPIRE: Scalable Processing of RFID Event Streams. InProceedings of the 5th RFID Academic Convocation, April

2007.[19] Richard Cocci, Yanlei Diao, and Prashant Shenoy.SPIRE: Scalable Processing of RFID Event Streams. InProceedings of the 5th RFID Academic Convocation, April

2007.[20] B. S. Prabhu, Xiaoyong Su, Harish Ramamurthy, Chi-Cheng Chu, Rajit Gadh, “WinRFID – A Middleware for theenablement of Radio Frequency Identification (RFID) basedApplications”, Invited chapter in Mobile , Wireless andSensor Networks: Technology, Applications and FutureDirections, Rajeev Shorey, Chan Mun Choon, Ooi WeiTsang, A. Ananda (eds.), John Wiley, December 2005 .[21] A. Oram, editor. Peer-to-Peer: Harnessing the Power ofDisruptive Technologies. O'Reilly & Associates, March 2001.[22] Raghavan, V., Bollmann, P., & Jung, G. S. (1989). Acritical investigation of recall and precision as measures ofretrieval system performance. ACM Trans. Inf. Syst.,7, 205-229.[23] Ram Swaminathan “Divide-and-conquer algorithms for

graph layout problems” Networks 28(2):69–85, 1996[24] S.Chawathe, V.Krishnamurthy, S.Ramachandran, andS.Sarma. Managing RFID data. Proceedings of the 30th

VLDB Conference, pages1189-1195, 2004.

309

Documents

(3) a P2P Collaborative RFID Data Cleaning Model