Upload
carl-anthony-abrenica
View
216
Download
0
Embed Size (px)
Citation preview
8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model
http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 1/6
A P2P Collaborative RFID Data Cleaning Model
Xiaogang Peng, Zhen JiSoftware College of Shenzhen University,
Shenzhen, P.R.China, 518060.
Zongwei Luo, Edward C. Wong, C. J. Tan E-Business Technology Institute G01-G05
Technology Innovation and Incubation Building The University of Hong Kong,
Pokfulam Road, Hong Kong
Abstract
RFID emerges to be one of the key technologies to
modernize Logistics and supply chain management. Ina typical RFID enabled logistics and supply chainapplication, there exist RFID readers to detect and
identify the goods with RFID tags attached.
Considering the huge amount of goods, the successful
reading of RFID data becomes a crucial issue. Manyalgorithms and models to improve the RFID reading
have been proposed, yet most of them focus on
addressing the problem in a single reading node. In
this paper, we introduce a P2P model to identify and
remove inaccurate reading of RFID data by utilizing
the information exchanged among related nodes along
the business processing route of each RFID tagged
item. The successful deployment of this model will easethe demand for high accurate reading of each RFID
reading node while reducing the total cost of the RFIDnetwork. Our simulation shows the RFID network
correct function against the business requirements will
be ensured while the overall performance of the RFID
network can be guaranteed.
1. Introduction
Radio Frequency Identification (RFID) is atechnology that allows readers to detect a tagged item
without line of sight or contact by using radio-
frequency waves. Because of the advantage of thereading flexibility, RFID is adopted in a wide range of
applications [1,2,3,4]. In a typical RFID enabled
Logistics and supply chain application, RFID readers
are employed to detect the RFID tagged items passing
by each detection node. Usually, a supply chain system
consists of a number of detection nodes. The reading
data of each detection node will be integrated and
analyzed to generated logics for higher level
processing such as RFID management and event processing[5], therefore a little decrease in the
successful RFID data read rate of each node will add
up to a noticeable drop in the overall system
performance in RFID event generating. Furthermore,the amount of RFID tagged items passing by these
nodes is always huge [6], a slight decreasing of the
overall successful read rate will cause a big number of
misreads on tagged items, which will reduce theaccuracy of further data process such as tracing or
event generating . Maintaining high successful RFID
read rates of the supply chain systems becomes one of
the most crucial issues in the RFID research area.
Unfortunately, according to related research as seenin [7,8], the observed accurate read rate of RFID data
in the real-world project is about 60-70%, which is far
from satisfying. In order to improve the successful readrates of RFID, many data cleaning algorithms and
mechanisms have been proposed as seen in[9,10,11,12]. Most of the proposed data cleaning
methods focus on the problems within a single node
but ignore that the tag readings from related detection
nodes will also provide useful information to assist the
data cleaning processes. In this paper, we consider the
detection nodes as a chain according the movement ofa RFID tagged item in a supply chain and introduce a
P2P model to identify and remove inaccurate reading
of RFID data by utilizing the information of connectednodes along the processing route of each RFID tagged
item. The successful deployment of this model will
also ease the demand for high accurate reading of eachRFID reading node while reducing the total cost of the
The 3rd International Conference on Grid and Pervasive Computing - Workshops
978-0-7695-3177-9/08 $25.00 © 2008 IEEE
DOI 10.1109/GPC.WORKSHOPS.2008.12
304
8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model
http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 2/6
RFID network. Our simulation shows the RFID
network correct function against the business
requirements will be ensured while the overall
performance of the RFID network can be guaranteed.
2. Related Research
Comparing to the current bar code, RFID excels in
the flexibility that no line of sight and contacts areneeded, therefore has been deployed in many different
area [2, 3, 4, 5]. For example, supply chain
management will benefit from RFID technology in
saving the labor cost on scanning tags, facilitating
inventory replenishment decision and tracing products
in supply chain [18]. Most of the existing RFID
enabled systems share the same three-level architectureas shown in figure 1.
Figure 1. RFID System Architecture
At the first layer, the detection nodes are deployed
to read the RFID tags in different locations according
to business requirements. In most systems, multiple
readers are used in each detection node to reduce the
false negative readings. Because of the comparativelylow correct read rate, usually 60%-70%, of the raw
RFID data captured by the readers, the cleaning or
“smoothing” [8,9] step is needed. Some of the data
cleaning mechanisms apply to readers [13, 15, 16] and
some requires a centralized back-end system [17, 18],
or the data processing server at layer two in figure 1, tohandle the raw data. In the server, other manipulations
like compression, simple rule association and eventgeneration [19] can also be found before entering the
enterprise application or database, which are at thethird layer.
One of the biggest challenges of the RFID data is
the data volume. As a real world example, seven
terabyte data generated by the Wal-Mart RFID trial
every day is reported [20]. Sending all these data in to
a centralized system for data cleaning requires a high
performance server as well as a high speed network,
which will inevitably increase the total hardware cost.One solution is to distribute the centralized information
into local nodes, then analyze the information in a
distributed way. In this paper, we propose a P2P modelto further reduce the communication overhead by
introducing a data cleaning relationship establishment
towards building data cleaning clusters, which will be
discussed in Section 3.
Different from the client-server network structurewith centralized resources bundled to one or several
servers, every node within the P2P network equally
participates in services or applications by sending or
receiving messages between other nodes through
diverse channels within the network [21]. Through this
structure, the processing power of each node and
bandwidths of connection can be better utilized andintegrated to accomplish some real time, heavily-
loaded computation tasks. By modeling the RFID datacleaning task with the P2P network concept, we designa novel P2P model over the RFID detection node
network and develop an algorithm to identify and
remove error data generated by readers in the nodes.
Consider the scenario that tagged items pass by a
RFID detection node, there are three types of errors
besides successful read: false negative, false positiveand redundant reading as stated in Section 1. The
redundant case can be removed by using a time stamps
to mark the entering and exiting of the tagged item.
Then by using the numbers of successful read, false
negative and false positive, the performance of the
node can be evaluated following the measures used ininformation retrieval [22]: let TP, which is the short
form of “true positive”, denotes the number ofsuccessful read cases, FN and FP represents the
numbers of false negative and false positive cases
respectively, the precision rate P of the node will be
P=TP/(TP+FP), and the recall rate R will be
R=TP/(TP+FN). In this paper, R is used to measure the
capability of the successful reading from the original
tags in each detection node while P is a measure to
evaluate the performance of each node in avoidingfalse positive cases. Another measure F1, formulated
as F1=2PR/(P+R), is used to assess the overall
performance of the node by combining precision andrecall. In an ideal zero-error case, these three valueswill be 100%.
Detection Node
Data Processing Server
Detection Node Detection Node
EnterpriseApplication
Software
Database orData
Warehouse
305
8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model
http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 3/6
3. The P2P Collaborative RFID Data
Cleaning Model
In many RFID enabled logistics or supply chain
management applications, RFID detection nodes are
deployed wherever the information of the tagged items
needs to be captured. The nodes within a supply chainsystem physically form a complicated network
connected with the channels. This scenario can be
abstracted as a complicated graph of detection nodes
with edges representing the possible movements of
tagged items. When the tags of the items are
recognized by a certain detection node, the informationwill be stored locally. Then the information will besent to a centralized server for further datamanagement process in normal case. In the P2P
approach, we define a RFID data exchange network
(RDEN) from the physical detection node network
layer by taking the detection nodes as vertexes and the
information exchanging between nodes as edges.The RFID data exchange network (RDEN) is
modeled as a undirected graph G(V,E). The vertex setV is defined as the RFID detection node in a supply
chain management system denoted by V(v1,v2,v3…).
There is an edge e∈E connecting detection node v1
with detection node v2 if there is informationtransferring between v1 to v2. The data management
process is done by sending and receiving short
messages between vertexes in this P2P network.For data cleaning purpose, a small number of vert
For data cleaning purpose, a small number of vertexes
in the network will be involved to form a data cleaning
cluster (DCC). Nodes in the cluster are related by the
business processing logic (BPL) towards solving RFIDdata cleaning problems. The method we used to build
the DCC from RDEN is similar to the building of
interconnected RFID reader collision model upon
reader collision network mentioned in [13], which
forms the network by exchanging the neighboring
information to update the routing neighbor table in
each node.The fact that each tagged item can only travel
through one path in time order in the supply chainmakes it possible to find a directed data cleaning path
(DDCP) within the DCC. Assumption is made that the
path can be determined before the each tagged item
enters the system. The assumption is self validated insome scenarios, such as items moving through an
assembly line or goods transporting from a
manufacturer to distribute center then a certain retailstore. One of the solutions for some other sophisticated
cases will be the “divide and conquer” methods[23],
which segments the DCC into simpler sub-networks
and apply some data aggregation based on the business
processing logic or path finding algorithms in graphs.The DDCP is therefore defined as a set of ordered
vertexes <v1, v2… vi…vn> , where n is the total
number of nodes and vi is the i-th nodes that each
tagged item passes. Consider the case of three nodesvi-1, vi and vi+1, vi-1 is the ‘PREVIOUS’ node of vi
while vi+1 is the ‘NEXT’ node of vi. When the DDCP
is known, the relation of the detection nodes along the
DDCP can be utilized in RFID data cleaning process.As a tagged item passes a node, the occurrence of the
item in this node will be recorded if it is read
successfully, and this information will assist in
detecting RFID reading errors for related nodes along
the path. For example, if an item is not detected in a
node yet reported in the previous node and the next
node, a false negative case will be announced with a
high probability. By exchanging the RFID reading
information between the nodes, the false positive andfalse negative cases in a node will be detected anderror correction will be applied properly. As a simple
example, consider the case of a DCC with 4 nodes in
the left hand side of Figure 2, tagged items are passing
through node a0 to either a1’ or a1’’ then goes to node
a2. By information exchange between nodes, items are
found both in a2 and a0, then a1’ and a1’’ will beregarded as one node logically after applying BPL and
the data of the two nodes will be integrated to form a
DDCP as seen in the right hand side of Figure 2.
Consider the scenario that some tagged items moving
from stating point a0 towards ending point an, in a
DDCP found in a DCC by using business logic, thetags are detected with a certain successful read rate r i
by each detection nodes ai when it passes by. In eachnode ai, besides the table with records of RFID in the
classic form of Ti<TagID, LOCATION,
TIMESTAMP>[14]. For collaborative data cleaning
purpose we introduced another table with the following
structure: TCi<TagID, PRE, CURRENT, NEXT,
STATUS >. In each row of the table, TagID records
the tagged RFID of the item. Binary values in PRE,
CURRENT, NEXT represent the detection of the itemin previous, current and next node accordingly with ‘1’
for detected and ‘0’ otherwise. The STATUS column is
used to indicate the data cleaning result of this item.
306
8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model
http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 4/6
Figure 2, DDCP finding in a DCC by BPL
The P2P collaborated data cleaning process can be
divided into three phases: initialization phase, localcorrection phase and peer correction phase. In the
initialization phase, when one of the tagged items is
detected, a record will be inserted into the TCi with the
RFID captured in the TagID column, at the same time,
the value in CURRENT column will be assigned to “1”
indicating that the item is recognized in this node.
Then the information of the detected node will be sent
to the previous and next nodes to update the values inthe corresponding columns of the rows with the same
RFID in table TCi-1 and TCi+1.
Table 1. Local correction phase of node ai.
Pre
Cur Next STATUS Actions taken
0 0 1 Pause Send -FP to nodea
i+1
0 1 0 Pause None0 1 1 Pause Send +FN to node
ai-1
1 0 0 Pause Send +FP to nodea
i-1
1 0 1 C1 ChangeCURRENT to 1,then send +C1 tonode a
i-1and –C1
to node ai+1
1 1 0 Pause NONE
1 1 1 OK NONEThe local correction phase follows the initialization
phase. In node ai, the program starts to check the
values in PRE, CURRENT and NEXT columns of TCi.
Based on the combination of these three values, node ai will update the values in STATUS column and
corresponding messages will be sent to ai+1 and ai-1. As
the only correction decision can be made in this phase,
the pattern “1,0,1”, which represent the false negative
case with high probability in ai, will triggers the actionof assigning 1 to current status to signify the false
negative case in this node and the STATUS is set to C1.
The updating of STATUS column and actions taken ondifferent pattern are summarized into table 1.
Figure 3. State transition diagram in thepeer correction phase.
In peer correction phase, the correction decisions
will be made by considering the “PRE, CURRENT,
NEXT” patterns as well as the messages received fromthe previous node and next node. The detail operations
for each different pattern of the nodes can be found in
the state transition diagram in figure 3. In the diagram,
the states are decided by the “PRE, CURRENT, NEXT” patterns. The transmissions of states aretriggered by the messaged received, which is denoted
as R (‘message’) on the out-going arrows of the
diagram. There are tour kinds of messages: FN (theitem with the RFID tag is predicted to be false negative
by the sender), FP (the item with the RFID tag is
predicted to be false negative by the sender), C1 (the
‘CURRENT’ value of the sender with the same RFID
has been changed to 1) and C0 (the ‘CURRENT’ value
of the sender with the same RFID has been changed to
0). The ‘+’ sign before a message shows that the
message is sent by the next node while ‘-’ sign
indicates that the message is from the previous node.Another function, denoted as C(value1, value2) in the
diagram, represents the ‘CURRENT’ value of thecurrent node will be changed from value 1 to value 2
and the STATUS column is changed to ‘C0’ or ‘C1’
according to value2. The function S(‘message’) means
a0
a1’
a2
a1’’
a0 a1 a2
Data integration by BPL
DDCP(a0,a1,a2)
100
000OK
001
111OK
110101011
010
R(+C0)
R(+fn), C(0,1)S(-C1), S(+C1)
R(-C1)
C(0,1)S(-C1), S(+C1)
R(+C1)
R(-fp) or R(+fp), C(1,0),
S(-C0), S(+C0)
R(+C1)
R(-C0)
R(+C1)
307
8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model
http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 5/6
the message is sent to previous node (‘+’ sign) or next
node (‘-’ sign). A series of actions will take place in
state transition, all of which are indicated on the arrows
in figure 3.
4. Simulation and Results
In order to evaluate the performance of our
proposed model, a simulation software system isdeveloped. In the simulation system, the number of
detection nodes n, the successful read rate r i for each
node ai , as well as the number of RFID tags m are
parameters that need to be decided before the
simulation. Then the system generates a table of m
RFID codes in the form To<TagID, LOCATION,TIMESTAMP> as the original data to represent the
real world tagged items. The readings of the taggeditems over each of the detection nodes are simulated as
a table Ti . The RFID data in Ti is different from To
because of the preset successful read rate r i in each
node. For simplicity, we evenly set the successful
reading rate of each node equal to a certain value r .
The collaborative data cleaning process starts by
initializing the Tci table in each node ai as explained in
previous section. The sending and receiving of datawill change the Tci table and the updated information
will eventually change the Ti table as described in
previous section.
We randomly generate 1000 RFID tags to test the proposed model in DDCPs with 3, 5 and 7 nodes
correspondingly. For each run, the same DDCP is
evaluated under four different successful reading rates:
60%, 70%, 80% and 90%. The values of Precision
rates, Recall rates and F1 measures of each node underdifferent experiment settings are calculated. In order to
give a overview of the performance of the whole model
in each testing case, after applying the collaborative
data cleaning algorithm, the average Precision rates,
Recall rates and F1 measures of different testing cases
are calculated and shown in table 2.
Based on the results, the proposed collaborativedata cleaning model improves the performance of each
node under all testing cases. The highest improvementoccurs in the nodes with the successful read rate of
60%, which achieves an improvement of 29% in
average precision, 17% in average recall and 23% in
F1 measure. This finding shows that the proposedcollaborative model is capable in removing errors in a
low successful read rate and therefore will ease thehigh demand of accuracy on each detection node
leading to a lower hardware cost of the system.
Another interesting phenomenon is that when the
number of nodes increases in a system, by applying the
collaborative model, the recall rate is slightly improved
under the same preset successful reading rate. This
result contradicts to the belief that more nodes in aRFID system will cause more chaos in data cleaning
and therefore this model is ideal for real world
implementation because of this characteristic. Theincrease rage of 5%-8% in the measures of the test
under a high success read rate 90% also suggest that
this model even fits in some systems with data cleaning
mechanism enabled for further improvement.
Table 2. Average performance on cleaned data
r n P_avg R_avg F1_avg
60 3 89.71% 76.30% 82.14%
60 5 89.30% 78.06% 83.04%
60 7 89.73% 78.21% 83.39%
70 3 92.88% 82.58% 87.27%
70 5 92.66% 84.92% 88.49%
70 7 92.79% 85.10% 88.69%
80 3 95.74% 88.40% 91.82%
80 5 95.23% 89.70% 92.32%
80 7 96.46% 90.12% 93.15%
90 3 98.16% 94.61% 96.33%
90 5 97.91% 95.08% 96.45%
90 7 98.40% 94.83% 96.57%
5. Conclusion
In this paper, we develop a novel P2P RFID data
cleaning model from the physical layer of the detectionnode network in real world applications. By
collaboratively sending and receiving messages
between related nodes, the model is capable of
detecting and removing false positive and falsenegative cases automatically to meet the data cleaning
goal. Compared to most existing data cleaning
mechanism, the P2P model excels in avoiding the
calculation overhead in the centralized server and the
huge amount data transfer problem to reduce the total
network cost.
In our simulation, DDCPs with different numbers ofdetection nodes at different preset successful read rates
each time are tested and we achieve improvements inall testing cases. Especially, in low accurate reading
rate settings, the improvements in recall rate are more
than 15%. These satisfying results provide a way to
ease demand for high accurate reading of each RFID
reading node while reducing the total cost of the RFID
308
8/13/2019 (3) a P2P Collaborative RFID Data Cleaning Model
http://slidepdf.com/reader/full/3-a-p2p-collaborative-rfid-data-cleaning-model 6/6
network. We also find that when the number of nodes
in a DDCP grows, the recall rate of each node in our
simulation actually goes up. This interesting finding
shows the robust scalability of the proposed model andtherefore will be ideal for real world implementation.
References[1] Siemens to Pilot RFID Bracelets for Health Care.:http://www.infoworld.com/article/04/07/23/HNrfid implants1.html (2004).[2] World’s Third Largest Retailer Completes WarehouseRFID Implementation.:http://www.informationweek.com/story/showArticle.jhtml?ar ticleID=57702741 (2005).[3] Tesco Pushes on with Full-scale RFID Rollout.:http://www.computing.co.uk/news/1160636 (2005).[4] R. B. Ferguson, “Logan Airport to Demonstrate Baggage,Passenger RFID Tracking”, eWeek, 2006.
[5] S.Chawathe, V.Krishnamurthy, S.Ramachandran, andS.Sarma, “Managing RFID data”, Proceedings of the 30th
VLDB Conference, 2004, pp. 1189-1195.[6] B. S. Prabhu, Xiaoyong Su, Harish Ramamurthy, Chi-Cheng Chu, Rajit Gadh, “WinRFID – A Middleware for theenablement of Radio Frequency Identification (RFID) basedApplications”, Invited chapter in Mobile , Wireless andSensor Networks: Technology, Applications and FutureDirections, Rajeev Shorey, Chan Mun Choon, Ooi WeiTsang, A. Ananda (eds.), John Wiley, 2005.[7] C. Floerkemeier and M. Lampe, “Issues with RFID usage
in ubiquitous computing applications” Pervasive Computing:Second International Conference, PERVASIVE, 2004.[8] S. Jeffery, M. Garofalakis and M. Franklin, “Adaptivecleaning for RFID data streams”, Proceedings of the 32ndinternational conference on Very large data bases(VLDB),2006, pp. 63–174.
[9] S. Jeffery, G. Alonso, M. Franklin, W. Hong, and J.Widom, “Declarative support for sensor data cleaning”,Pervasive, 2006.[10] J. Waldrop, D. W. Engels and S. E. Sanna, “Colorwave.a MAC for RFID reader networks”, In IEEE WirelessCommunications and Networking Conference (WCNC), New
Orleans, Louisiana, USA, 2003, pp. 1701–1704.[11] Daniel W. Engels, “The reader collision problem,AUTO-ID Center Whitepaper”,http://autoid.mit.edu/whitepapers/MIT-AUTOID-WH-007.PDF, 2002.[12] Engels D.W., Sarma S. E, “The reader collision problem”, IEEE International Conference on Systems, Manand Cybernetics, Hammamet, Tunisia, 2002.[13] Shijie Zhou, Zongwei Luo, Edward Wong, CJ Tan,“Interconnected RFID Reader Collision Model and itsApplication in Reader Anti-collision”, IEEE RFID 2007,
Texas, USA, 2007.[14] Shan R. Orlowska M., Li X., “RFID Data Management:Challenges and Opportunities”, IEEE First International
Conference on RFID, Grapevine, Texas, USA, 26-28, 2007, pp. 175-182.[15]Yuan, Y.; Yang, Z.; He, Z.; He, J. Taxonomy and surveyof RFID anti-collision protocols, Computer Communications,29 (2006),pp.2150–2166[16] Su-Ryun Lee, Sung-Don Joo Chae-Woo Lee, An
enhanced dynamic framed slotted ALOHA algorithm for
RFID tag identification. The Second Annual InternationalConference on Mobile and Ubiquitous Systems: Networkingand Services(MobiQuitous 2005). San Diego, CA, USA,2005, pp. 166-174[17] M. J. Franklin, et al.. Design Considerations for HighFan-In Systems: The HiFi Approach. In CIDR. 2005.[18] Laurie Sullivan. RFID Implementation ChallengesPersist, All This Time Later. Information Week, Oct 2005.[19] Richard Cocci, Yanlei Diao, and Prashant Shenoy.SPIRE: Scalable Processing of RFID Event Streams. InProceedings of the 5th RFID Academic Convocation, April
2007.[19] Richard Cocci, Yanlei Diao, and Prashant Shenoy.SPIRE: Scalable Processing of RFID Event Streams. InProceedings of the 5th RFID Academic Convocation, April
2007.[20] B. S. Prabhu, Xiaoyong Su, Harish Ramamurthy, Chi-Cheng Chu, Rajit Gadh, “WinRFID – A Middleware for theenablement of Radio Frequency Identification (RFID) basedApplications”, Invited chapter in Mobile , Wireless andSensor Networks: Technology, Applications and FutureDirections, Rajeev Shorey, Chan Mun Choon, Ooi WeiTsang, A. Ananda (eds.), John Wiley, December 2005 .[21] A. Oram, editor. Peer-to-Peer: Harnessing the Power ofDisruptive Technologies. O'Reilly & Associates, March 2001.[22] Raghavan, V., Bollmann, P., & Jung, G. S. (1989). Acritical investigation of recall and precision as measures ofretrieval system performance. ACM Trans. Inf. Syst.,7, 205-229.[23] Ram Swaminathan “Divide-and-conquer algorithms for
graph layout problems” Networks 28(2):69–85, 1996[24] S.Chawathe, V.Krishnamurthy, S.Ramachandran, andS.Sarma. Managing RFID data. Proceedings of the 30th
VLDB Conference, pages1189-1195, 2004.
309