Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
1
Abstract—In the research of location privacy protection, the
existing methods are mostly based on the traditional anonymization, fuzzy and cryptography technology, and little success in the big data environment, for example, the sensor networks contain sensitive information, which is compulsory to be appropriately protected. Current trends such as "Industrie 4.0" and Internet of Things (IoT), generate, process, and exchange vast amounts of security-critical and privacy-sensitive data, which makes them attractive targets of attacks. However, previous methods overlooked the privacy protection issue, leading to privacy violation. In this paper, we propose a location privacy protection method that satisfying differential privacy constraint to protect location data privacy and maximize the utility of data and algorithm in Industrial Internet of Things. In view of the high value and low density of location data, we combine the utility with the privacy and build a multilevel location information tree model. Furthermore, the index mechanism of differential privacy is used to select data according to the tree node accessing frequency. Finally, the Laplace scheme is used to add noises to accessing frequency of the selecting data. As is shown in the theoretical analysis and the experimental results, the proposed strategy can achieve significant improvements in terms of security, privacy, and applicability.
Index Terms—Differential privacy, location privacy protection, location privacy tree, Internet of Things.
I. INTRODUCTION
HE development of information technology has accumulated a great deal of data for today's digital systems.
Big data is a very important research and development resource,
This work was funded by the National Natural Science Foundation of China
(61772282, 61373134, 61772454, and 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). (Corresponding author: Jin Wang.)
Chunyong Yin is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).
Jinwen Xi is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).
Ruxia Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).
Jin Wang is with the College of Information Engineering, Yangzhou University, Yangzhou 225127, China (e-mail: [email protected]).
and the demand for data publishing, sharing and analysis are also growing rapidly. People pay more and more attention to the security of data protection, and the government, enterprises, and individuals are also improving their understanding of privacy protection.
Body sensor networks (BSNs), as a special application of wireless sensor networks (WSNs) [1], are deployed on the surface of bodies for periodically monitoring physical conditions. For a typical social participatory sensing application, it is important to motivate participatory, at the same time, the participatory sensing process should not disclose the privacy information of any participating party (the private data) or the community (patterns, distribution, etc.). Therefore, it is essential to develop a privacy protection data strategy for the protection of the users and the community [2].
Devices in the Internet of Things (IoT) generate, process, and exchange vast amounts of security and safety-critical data as well as privacy-sensitive information, and hence are appealing targets of various attacks. To ensure the correct and safe operation of IoT systems, it is crucial to assure the integrity of the underlying devices, especially their code and data privacy, against malicious modification [3].
The privacy threats of Industrial Internet of Things (IIoT) [4] can be simply divided into two categories: privacy threats based on data and privacy threats on location. Data privacy issues mainly refer to the leakage of secret information in the process of data acquisition and transmission in Internet of Things.
Location privacy is an important part of privacy protection of the Internet of Things. It mainly refers to the location privacy of each node in the Internet of Things and the location privacy of the Internet of Things in providing various location services, especially including the RFID reader location privacy RFID user location privacy, sensor Node location privacy, and location-based privacy issues based on location services [3], [5].
Usually, data collected, aggregated and transmitted in sensor networks contain personal and sensitive information, which directly or indirectly reveals the condition of a person. If the data cannot be properly preserved, once exposed to the public, the privacy will be destroyed. Therefore, protecting the privacy of sensitive data is greatly important [6], [7].
Location data implies moving objects, spatial coordinates, current time, and unique features different from other data, which is discrete and of high value. Before the concept of big data, most of the privacy protection methods focus on a small
Chunyong Yin, Jinwen Xi, Ruxia Sun, Jin Wang, Member, IEEE
Location Privacy Protection based on Differential Privacy Strategy for Big Data in
Industrial Internet-of-Things
T
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
2
number of non-positional data. There are some limitations for the protection of location data privacy in big data. There are two main reasons as follows: (1) Multiple data fusion by big data makes traditional anonymity [8], [9], [10] and fuzzification technology difficult to take effect in location privacy protection [11]; (2) Traditional cryptography technology takes little effect on the real-time analysis required by big data.
In summary, location privacy protection faces great challenges for big data in IIoT. Therefore, we propose a more rigorous LPT-DP-k location privacy protection method based on differential privacy strategy for big data in sensor networks.
Our specific contributions mainly include: (I) We introduce a tree structure to represent the location data
in sensor networks, which called LPT (Location privacy tree), according to the characteristics and retrieval difficulty of location data.
(II) The differential privacy strategy is suitable for location privacy protection because it is insensitive to the background knowledge and the DP-k (Differential privacy-k) model has better protection effect. Laplace and index mechanism are the main implementation mechanisms of differential privacy, which can show the degree of privacy protection by allocating the privacy budget and is relatively reliable and rigorous.
(III) We conduct extensive experiments on real-world datasets which show that the proposed location privacy protection method can protect users’ location privacy without significantly affecting the privacy, availability, security, and effectiveness.
Section 2 presents necessary background and related work. Section 3 introduces the preliminary knowledge. Section 4 details the proposed location privacy protection method. The real-world datasets and experimental results are presented in Section 5 and conclusions in Section 6.
II. RELATED WORK
In the process of data mining and data publishing, the privacy protection of location data mainly involves two aspects: privacy model and utility.
A. Privacy Model
In recent years, with the widespread application of location services and data mining and publishing, location privacy protection model in location service can be divided into two categories: one is the traditional anonymous model based on grouping; the other is the differential privacy model that ignores the attacker’s background knowledge.
(1) Anonymous model based on grouping The traditional anonymous model based on grouping plays
an important role in location privacy protection. Samarati et al. [12] firstly proposed k-anonymity method and other a large number of methods based on k-anonymity [13], [14], [15]. Previous research [15], [16] shows that only using anonymous methods does not provide good protection to a wide range of data. According to [17], the encryption privacy protection method is proposed, which can completely protect the privacy of data and prevent the leakage of data in the process of location
service, but the availability of data is insufficient. The development of traditional location privacy protection technology has gone through three stages: the "informed and consent" method proposed by document [18] has been developed to deal with anonymous processing in a single location, and then to deal with anonymous processing of user's trajectory data. Heuristic privacy measurement, probability deduction, and private information retrieval based technologies are common methods to protect location privacy. The heuristic privacy measurement method mainly protects the users who are not in the strict privacy protection environment, such as k-anonymity, t-closeness [19], m-invariance [20], and l-diversity. However, these three kinds of methods are proposed as a unified attacking model, which protect the location data under the premise of accumulating certain background knowledge. As attackers grasp more background knowledge, these methods cannot effectively protect the user’s location data privacy and the lack of privacy protection about the type of relational privacy protection method is proposed in [21].
(2) Differential privacy model Differential privacy model is a strong privacy concept which
is completely independent of attacker’s background knowledge and computing ability, and has become a research hotspot in recent years. Compared to the traditional privacy protection model, differential privacy has its unique advantages. Firstly, the model assumes that the attacker has the maximum background knowledge. Secondly, differential privacy has a solid mathematical foundation, a strict definition of privacy protection and a reliable quantitative evaluation method. In recent years, differential privacy model has been widely applied in privacy protection. Especially, the FM (Functional mechanism) is introduced in [22], in which an objective function of -differential privacy disturbance optimization problem is used to protect privacy. The Diff-FPM (Privacy-Preserving Mining) algorithm is proposed in [23], [24], [25] and combined with MCMC (Markov Chain Monte Carlo), which provides privacy protection and maintains the high data availability while satisfying ( )-differential privacy conditions. The PrivBasis and SmartTrunc method proposed in [26] adopt differential privacy model in the mining process of frequent itemsets, ensuring the privacy and utility of data analysis and anonymity. The DiffP-C4.5 and DiffGen algorithm introduced in [27] combines differential privacy with decision trees and other data structures to maintain a balance between data privacy and availability.
In a word, differential privacy is an effective privacy protection mechanism, which protects the user’s location privacy while keeping enough useful information for data analysis.
B. Utility Maximization
In the process of location service, data mining and data publishing need to protect location privacy and provide enough information for data analysis. Therefore, data utility is the core issue that needs to be paid attention to. Reference [28] uses compressive sensing theory to propose a perception mechanism
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
3
to solve the published issue of statistical results, which can effectively solve the insufficiency problem of data utility, but it destroys the link among data. DP-topk method is proposed in [29], which has a more rigorous definition of effectiveness. But this method ignored the relation among transaction data, and because of its poor efficiency in data processing individually, the availability of algorithm is not high. To maximize the utility of the results and meet the requirement of location privacy protection, the paper proposes LPT-DP-k algorithm, which is a more rigorous differential privacy protection method, not only can guarantee the high availability of data, but also can enhance the usability of the algorithm.
III. PRELIMINARY KNOWLEDGE
Differential privacy is a common privacy protection framework that supported by the solid mathematical theory, which can provide privacy protection for data in the case that the attacker grasps the largest background knowledge. Definition 1 (Differential privacy)
Suppose there is a random algorithm M , MP is all possible
output sets for M , for any given adjacent data set D and D (there is at most one different record between them),
1D D , MS is any subset of MP . If the algorithm M
satisfies the following inequality, the algorithm M will satisfy -differential privacy protection.
exp .r M r MP M D S P M D S (1)
in which .rP represents the randomness of the algorithm M
on the data set D and D . Definition 2 (Sensitivity)
The differential privacy protection method defines two kinds of sensitivity, called global sensitivity and local sensitivity. Suppose there is a query function : df D R , the input is a
data set and the output is a d -dimensional real vector. For any adjacent data set:
1,
= max .D D
f f D f D
(2)
called the global sensitivity of function f , and f represents
the maximum change value of the output results.
f D f D is the 1-order norm distance between f D
and f D .
Definition 3 (Implementation mechanism) Laplace mechanism and Exponential mechanism [30] are
two of the most basic implementation mechanisms of differential privacy protection. In this paper, the Laplace mechanism is used to add the noise that obeys the Laplace distribution to realize the differential privacy. Assuming the privacy protection algorithm f based on the Laplace
mechanism, the noise keeps to the Laplace distribution with the
variance f
is and the mean is 0. Then the probability density
function is:
r
1P , = exp .
2
xx
(3)
in which x represents the specific variable and =f
.
If the algorithm f is proportional to the probability of
,exp
2
D r
to select from O and export r , then the
algorithm f provides -differential privacy protection, the
concrete formula is as follows:
r
,, = : P exp .
2
D rA D r r O
(4)
in which is the global sensitivity of the scoring function
,D r , (4) shows that the higher the score is, the larger the
probability of the selected output will be.
IV. LPT-DP-K LOCATION PRIVACY PROTECTION
This section introduces the overview of LPT-DP-k algorithm and the specific implementation details of the algorithm.
A. LPT-DP-K Algorithm
Firstly, we choose the location privacy tree construction to maintain the relation among the location data. Secondly, we select the sensitive location information which is most likely to disclose privacy to add noise. The algorithm summarizes as follows: Algorithm 1 LPT-DP-k algorithm
1: Initialize: data set *D , location data set D and *D D , differential
privacy parameter , parameter k , the number of nodes that are accessed by
the data is count and empty sets A , B . 2: Construct a location privacy tree (LPT). 3: The node data will be divided into two categories, and the frequent patterns
of accessing frequency not less than count are recorded in the set, see section 3.2.
4: Calculate the output probability, the formula is
1
.
.
ir i n
jj
a wP a
a w
, see
section 3.3.
5: Add noise and combine with *D
DLPT to construct the noisy location
privacy tree *D
DLPT k , see section 3.4.
6: Finally, publish the *D
DLPT k after step 4.
B. Construction of Location Information Tree Corresponding to Location Data
Assuming the location privacy tree *D
DLPT is corresponding
to the data set D , which records each person’s visiting records within a month. According to the location information, LPT-DP-k algorithm uses the tree structure with better correlation. The corresponding Table I is shown as follows:
TABLE I LOCATION CORRESPONDENCE
Number Location information Accessing count
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
4
1 Computer building 28
2 Remote Sensing
Institute 13
3 All-in-one card
management center 2
4 Mingde Building 6 5 Training Gym 5 6 Dormitory 3 30 …… …… ……
As is shown in Table I, the user’s location information and accessing frequency of the places within a month is listed, we can get the location correspondence table.
TABLE II LOCATION DATA INFORMATION
Number Location label Accessing count 1-28 1 28 29-41 2 13 42-43 3 2 44-49 4 6 50-54 5 5 55-84 6 30 85-97 1,2 13 …… …… ……
According to the content in Table II, we can construct the location privacy tree like Fig. 1.
0
1#28
2#13
...
...1,3#2
1,2#13
1,2,3#2
1,2,4#6
...
1,2,3,6#2
1,2,3,5#2
...1,2,3,4
#2 Fig. 1. Location privacy tree
As shown in Fig. 1, the total number of nodes in *D
DLPT is *
*
*
1
1
2 1D
D
Di
C
.
C. Weighted Selection based on Exponential Mechanism
We use the exponential mechanism in the differential privacy protection model to do weighted selection. The formula is as follows:
1
..
.
ir i n
jj
a wP a
a w
(5)
r iP a represents the probability of being selected and .ia w
represents the weight of the pattern ia . The selection algorithm
based on the exponential mechanism is as follows: Algorithm 2 Selection algorithm
1: Initialize: records in the frequent pattern set A .
2: Input a set A of frequent pattern records and score each record ia .
3: Calculate the weight .ia w of each pattern record.
4: Take the weight .ia w of each pattern in the set A into the formula
1
.
.
ir i n
jj
a wP a
a w
, select k frequent pattern records ia and record the
set as B .
the scoring function is:
, .i iM A a Q a (6)
iQ a represents the accessing frequency of pattern ia .
Calculate the weight .ia w of each pattern record.
1 ,
. =exp .2
ii
M A aa w
M
(7)
This paper sets up the scoring function , i iM A a Q a ,
iQ a represents the accessing frequency of pattern ia .
, iM A a represents the scoring value of ia . The formula to
calculate M is as follows:
1,
max .i ji j N
M Q a Q a
(8)
M represents the maximum value of the difference in the accessing frequency among N data record modes.
D. Noise Enhancement based on Laplace Mechanism
This paper proposes the method of node data processing to improve the efficiency of data processing and the effectiveness of the algorithm. The concrete steps are as follows: Algorithm 3 Laplace noising algorithm
1: Initialize: the privacy budget , k frequent pattern set B after weighted
selection, query function f .
2: Add noise into the frequent pattern set B to keep to the Laplace distribution
and destabilize the real accessing frequency of k records. The added noise keeps to Laplace distribution that the probability density function is
r
1P , = exp
2
xx
.
3: According to (9), the noise set C can be calculated. Finally, the noisy
location privacy tree *D
DLPT k is generated by combining the original data
tree *D
DLPT with the set C .
Therefore, it should be satisfied in every round of noise adding process:
.f
f b f b lap
(9)
B is the set of k frequent pattern b . f b is the query
function on the record b and f
lap
represents the noise
that keeps to Laplace distribution, as is shown in Fig. 2.
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
Fig
be
LP
efth
ap
ap
A.
(GamIII
Us
Mpy
g. 2. Relationship
True positivee used to mea
*DDPT k . Fal
ffectiveness ofe effectivenesTrue Positive
ppear in kf D
False Positiv
ppear not in kf
Accuracy:
False Rejectio
F
Error:
Experiment
Data sThis paper u
Gowalla total-mount of data I shows the ex
ser id Check-0 2010-10-190 2010-10-180 2010-10-170 2010-10-12
OperaThe experi
MATLAB(R20ython). Hardw
p between privacy
e (TP), false Poasure the utili
lse rejection r
f the algorithmss of the algorie (TP): the n
D and in kf D
kTP f
ve (FP): the n
kf D , but in
kFP f D
Accuracy
on Rate (FRR
kf DFRR
LPError
V. EXPERIM
Setting
set uses the chec-check-in dataand records th
xperimental daTA
EXPERIME
-in time LT23:55:27Z 30.2T22:17:43Z 30.2T23:42:03Z 30.2T19:44:40Z 30.2
ating environmimental envi13b) and PyCware environ
y budget and nois
ositive (FP) anty of noisy lo
rate (FRR) is
m. The smallerithm will be. number of fre
D :
k kD f D
number of fre
kf D :
k kf D f
k k
k
f D f
f D
R):
kf D f
k
* *
*
2
D DD D
DD
PT LPT
LPT
MENTAL ANALY
ck-in data set a set). This dhe informationata set. ABLE III ENTAL DATA SET
Latitude Lon23590912 -97.7926910295 -97.7425573099 -97.7626910295 -97.74
ment and paramironment in
Charm (Part ofnment is: 2
se size
nd accuracy [3ocation privac
used to analy
r FRR is, the
equent pattern
.
equent pattern
.kf D
.
D
.
kf D
2 .k
YSIS
from the Godata set has an of the users.
ngitude Locat9513958 2284939537 4206338577 3164939537 420
meter configurn this papef the program.60GHz Cor
31] can cy tree
yze the
higher
ns that
(10)
ns that
(11)
(12)
(13)
(14)
owalla a large Table
tion id847
0315 6637 0315
ration er is
m using e(TM)
i7-6para
Lo
B.
T(1) Timpriv
TIM
patten163264122551
102204409
Arecois co
k
Aextrincrprottime
C.
Twithreprsens3, 4
6700HQ CPUameters are sh
Parameter
k
Location id ocation accessing
frequency
Analysis of Al
The timelinessTimeliness
meliness of wvacy tree. (3) T
ME OF BUILDING
ern n
time.build (s
6 0.00000152 0.00000164 0.00000298 0.00000576 0.00001162 0.0000149
24 0.000026248 0.000037096 0.0000443
As shown in Tonstructing loconstantly chan
EXTRACTI
0.01
k 3432
As is shown raction efficierease of . Atection decreae increases.
Analysis of D
This paper exth 500, 1000, resent the insesitive location
4, and 5:
U, 20.00GB, Whown in Table
TABPARAMETE
20 40 60 80
0.005 0.01 0.0.7
lgorithm Time
analysis is mof constructin
weighted selecThe efficiency
TABAND ADDING NO
INFORMA
) time.update (s)
0.000103010 0.000103110 0.000103324 0.000103010 0.000104032 0.000103000 0.000105032 0.000106340 0.000108970
Table V, the cation privacynging.
TABION EFFICIENCY O
0.03 0.05
3691 4254
in Table VIency of locaAs the inc
ases and the n
Data Utility
tracts data froand 2000 da
ensitive locations. The experim
(
Win10 systemIV during theLE IV ER SETTING
value 100 150 200 300
10000 200.015 0.02 0.025 0
75 1 1.1 1.25 1.5 11 2 3 4 5
28 13 2 6 5
eliness
mainly analyzeng location
ction and updy of extracting BLE V OISE TO RECONSTR
ATION TREE
) pattern
n time.bu
16 0.000032 0.000064 0.0000128 0.0000256 0.0000512 0.0000
1024 0.00002048 …4096 …
efficiency ofy tree is higher
LE VI OF LOCATION DA
0.1 0.5
42103 8932
I, within a cation data increases, the d
number of extr
om multiple gata. The blue ons and the redmental results
a)
m of 64 bit. e experiment.
500 1000 2000 5000 .03 0.035 0.04 0.
1.75 5 10 15 6 30
ed in three aspprivacy tree.
dating of localocation data.
RUCT THE LOCAT
uild (s) time.upda
0014 0.0001020024 0.0001020035 0.0001020061 0.0001020110 0.0001020150 0.0001020280 0.000102
… …… …
f constructingr when the mo
ATA PRIVACY
1.0 1
2 53245 79
certain range,ncreases with degree of priraction in the
groups of pattdots are use
d ones indicats are shown in
5
The
5000
1 0.5
pects: . (2) ation .
TION
ate (s)
2190218021952120209520952030
g and ode n
1.5
9310
, the h the ivacy unit
terns ed to e the
n Fig.
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
Figare
loan
Figloc
losewh
g. 3. Data distribue shown before ad
The red sensications are 39
nd the blue ins
g. 4. Data distrcations are shown
There are 141cations in Fi
ensitive positiohich shows th
ution before and adding noise. (b) T
itive locations7 in Fig. 3(a),
sensitive locati
ribution before ann before adding no
1 red sensitive ig. 4(a), afteron and 781 bluat the sensitiv
(b) after protection (k The ones after add
s are 103 and the red sensitiions are 353 in
(a)
(b) nd after protectiooise. (b) The ones
locations and r protection,
ue insensitive lve locations ha
(a)
is 500). (a) The loding noise.
the blue insenive locations an Fig. 3(b).
on (k is 1000).s after adding nois
859 blue insenthere are 21
locations in Figave increased.
ocations
nsitive are 137
(a) The se.
nsitive 19 red g. 4(b),
Fig. locat
Tinsered Fig.utili
Panu
12
WTDPexp
Fig.
FLPTis bproppropthe privand TDPthe lowrelaVIII
UTI
5. Data distribtions are shown b
There are 23ensitive locatiosensitive loca
. 5(b). Tableity.
UTILITY S
Bef
attern umber
Sensitlocation
dots500 103
1000 1412000 235
We compare tPS_LP_Resulterimental resu
6. Error compari
From Fig. 6 shT-DP-k algoritbeginning to sposed methodposed method
Error be svacy protectiond the higher tPS_LP_Resultprivacy param
wer error. Whenatively larger.TI and IX:
ILITY COMPARISO
(bution before and
before adding nois
5 red sensitons in Fig. 5(aations and 162VII gives th
TABLSTATISTICS UNDE
fore the experimentive
ns (red s)
Insensilocations
dots3 3971 8595 1765
the proposed t, TDPS_Sig
ults are shown
ison of adding-no
hows that: (1) thm is the smastabilize, whicd is higher
ds; (2) as the smaller; whenn, the smallerthe privacy prt, TDPS_LP_
meters are requn the privacy pThe experime
TABL
ON OF TOP k FRE
PRIVACY PARA
b) d after protectionse. (b) The ones a
tive locationsa), after protec20 blue insene statistical r
LE VII ER DIFFERENT DA
nt Aftetive (blue )
Sensitilocations
dots)7 137 9 219 5 380
method with gnal and TDn in Fig. 6:
oise methods
the noise erroaller, when ch shows thatand better thprivacy paramn meeting thr is, the morotection leve_Signal and Tuired to be larparameters arental results ar
LE VIII
EQUENT ITEMSET
AMETER ( 1 )
n (k is 2000). (aafter adding noise
s and 1765 ction, there arensitive locationresults of the
ATA MODELS
er the experimentive (red )
Insensitilocations (
dots)353781
1620
three methodDPS_EP and
or of the prop0.015 , the
t the utility ofhan the prevmeter increhe -differe
ore the noise nel will be; (3)TDPS_E methrger to achievee small, the errre shown in T
MINING UNDER F
6
a) The .
blue e 380 ns in data
ive (blue
ds of the
posed error f the vious ases,
ential needs ) for hods, e the ror is Table
FIXED
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
51251020
thacthwhm
U
0.0.00
1
th
LP7
Fig
Fig
reva
TPk LPR LPS
50 46 47 00 80 78 00 126 126 00 305 290 000 530 540 2000 680 940
As shown in e accuracy of
ccuracy of the e accuracy of then 500k ,
more than 80%.
UTILITY COMPARI
TP LPR LPS
.01 136 184
.05 178 182 0.1 174 180 0.5 180 190 1 182 188 .5 184 190
As shown in e higher the pFinally, the
PT-DP-k by Fand 8:
g. 7. FRR compa
g. 8. FRR compa
Assuming ksult is shown i
alue of FRR w
P EP DPk LPR 26 49 4 39 96 20 70 190 74
150 455 195 230 890 470 40 1720 1320
Table VIII: (1f 4 methods dprevious threethe proposed mthe accuracy
. TA
ISON OF TOP k F
FREQUENT PAT
P EP DPk LPR 70 188 64 64 188 22 90 190 26
160 190 20 168 188 18 174 190 16
Table IX, therivacy protectpaper compa
FRR. The expe
arison when k
arison when =1 is 100, the pin Fig. 7: with
with DP-topk an
FP LPS EP DPk
3 4 1 22 61 4 74 130 10
210 350 45 460 770 110 1060 1960 280
1) with the valdecreases; (2) e methods is lmethod still rey decreases bu
ABLE IX
FREQUENT ITEMSE
TTERNS ( 20k FP
LPS EP DPk16 130 12 18 136 12 20 110 10 10 40 10 12 32 12 10 26 10
smaller the ption level will ares the utiliterimental resul
100
privacy parameh privacy paramnd the propose
Accuracyk LPR LPS EP
0.93 0.95 0.50.80 0.78 0.30.63 0.63 0.30.61 0.58 0.30.53 0.54 0.20.38 0.47 0.2
lue of k increwhen 20k
less than 70%,emains at abouut still mainta
ET MINING UNDER
0 )
Accuracyk LPR LPS EP
0.68 0.92 0.30.89 0.91 0.30.87 0.90 0.40.90 0.95 0.80.91 0.94 0.80.92 0.95 0.8
privacy paramebe.
ty of DP-toplts are shown i
eter is changemeter increasined method mai
y P DPk3 0.989 0.965 0.950 0.913 0.890 0.86
easing, 0 , the , while
ut 95%, ains at
R FIXED
y P DPk5 0.942 0.945 0.950 0.954 0.947 0.95
eter is,
pk and in Fig.
ed, the ng, the intains
lowwe cwhe
Ashowbut up, levealgo
Tbasenetwconthe becaand posidataTheprotlocais mefficefficproctarg
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
w, but the valuecan see that then the protectiAssuming iswn in Fig. 8: wthe value of Dthe proposed
el of location porithm.
This paper proed on differenworks. The mstructing the lproblem that ause of its cha
d adds noise inition data. It ia and maintaine differential tection of locaation privacy p
more rigorous aciency. In the cient data strucess of locatio
get functions fo
I. F. Akyildiz, Wsensor networkJournal of Comppp. 393-422, 20K. Xing, C. Q.Privacy PreserviIEEE Transact2066-2076, 2017Z. H. Lv, H.“Next-GeneratioFuture Researchvol. 13, no. 4, ppF. Xiao, L. T. Shfor unknown BInternet of ThComputing, no. Z. Q. Wang, FSee-through-waon Battery-free Systems, vol. 17H. Alemdar andsurvey,” CompuZ. J. Fu, K. Repersonalized seimprovement,” Ivol. 27, no. 9, ppH. Rong, T. H.isomorphism mdetection,” Soft T. H. Ma, Y. LAbdullah and A.and Edge Mod1336–1344, 201
e of the propohe availability ion level is higs 1, the valuewith k increas
DP-topk is largemethod has o
privacy protec
VI. CON
oposes a locatntial privacy method exprelocation informthe location d
aracteristics of nformation to cis more effectning high avaprivacy proteation privacy.protection algand has highernext step, we
ucture to expron data expre
for different ap
REFER
W. Su, Y. Sankaraks: a survey,” Cputer & Telecom10. Hu, J. G. Yu, X
ing k-Means Clustions on Industr7. B. Song, P. on Big Data Analh Topics,” IEEE p. 1891-1899, 20ha, Z. P. Yuan and
Bugs based on Anhings,” IEEE Tr99, pp. 1–13, 201F. Xiao, N. Yell System for DevRFID,” ACM T
7, no. 1, pp. 1–21, d C. Ersoy, “Wireuter Networks, volen, J. G. Shu, Xearch over encryIEEE Transactionp. 2546–2559, 20. Ma, M. L. Tan
method in socialComputing, pp. 1
L. Zhang, J. Cao,. R. Mznah, “KDV
dification algorith5.
osed method iof the propose
gh. e of k is chansing, the valueer than that in obvious advan
ction and the e
NCLUSION
tion privacy pstrategy for b
esses the posmation tree modata is difficuf high dispersiocover the origtive in protectailability of daection model . Compared wgorithm, the prr algorithm utilwill discuss to
ress location iession and propplication scen
RENCES asubramaniam andComputer Netwomunications Netw
X. Z. Cheng andstering in Social Prial Informatics,
Basanta-Val, Alytics: State of thTransactions on 17. d R. C. Wang, “Vnalysis for know
Transactions on 17. , R. C. Wang vice-free Human Transactions on
2017. eless sensor netwl. 54, no. 15, pp. 2. M. Sun and F.
ypted outsourcedns on Parallel an
016. ng and J. Cao, “Al network based1–19 , J. Shen, M. L. VEM: a k-degree hm,” Computing
s lower. Fromed method is b
nged, the resue of FRR increthis paper. Tontages both in
effectiveness o
protection mebig data in seition data seodel, which so
ult to be expreon and low den
ginal trajectoryting the privacata and algoriis applied to
with the traditiroposed algorlity and proceso explore the minformation inopose more unarios.
d E. Cayirci, “Wiorks the Internaworking, vol. 38,
d F. J. Zhang, “MParticipatory Sen
vol. 13, no. 4
A. Steed and Mhe Art, ChallengeIndustrial Inform
VulHunter: A Discwn patches in Ind
Emerging Topi
and P. L. YangMotion Sensing BEmbedded Comp
works for healthca2688-2710, 2010. X. Huang, “Ena
d data with efficnd Distributed Sys
A novel subgraphd on graph sim
Tang, Y. Tian, anonymity with V
g, vol. 70, no. 6
7
m this better
ult is ases,
o sum n the of the
ethod ensor et by olves essed nsity,
y and cy of ithm. o the ional rithm ssing more n the
utility
ireless ational
no. 4,
Mutual nsing,” 4, pp.
M. Jo, s, and
matics,
covery dustry ics in
g, “A Based puting
are: A . abling ciency stems,
h K+- milarity
A. D. Vertex 6, pp.
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics
[10
[11
[12
[13
[14
[15
[16
[17
[18
[19
[20
[21
[22
[23
[24
[25
[26
[27
[28
[29
[30
[31
0] C. Y. Yin, S. model for big & Computatio
1] C. Y. Yin, J. WImproved K -VMobile Inform
2] P. Samarati anwhen discloSIGACTSIGMACM, New Yo
3] R. C. W. WonAn enhanced kin Proc. of theand Data Min
4] N. H. Li, T. beyond k-anoConf., on Data
5] E. Zheleva relationships ion Privacy, S153–171.
6] A. Korolova, Rnetworks,” in P1355–1357.
7] E. D. CristofarPrivacy at theand Privacy, I
8] A. R. Beresforprivacy for appWorkshop on York, 2011, pp
9] B. Bamba, L. Lqueries in moInt’l Conf., on
0] L. Liu, “Fromin Koch C, edACM, New Yo
1] C. Dwork, “DAnnual ACM-and Applied M
2] B. Gu, V. S. support vectoNeural Netwo2014.
3] B. Gu, V. S. Slearning for v140-150, 2015
4] B. Gu and Vv-support vectand Learning
5] B. Gu, X. MMachine,” IEEvol. 28, no. 7,
6] E. T. Shen anprivacy,” in PDiscovery and545–553.
7] C. Zeng, J. F.itemset minin(VLDB), Trent
8] A. Friedman aProc. of the AData Mining (
9] Y. D. Li, Zmechanism: UProc. of the 1Society, ACM,
0] X. J. Zhang, Mtop-k FrequenResearch and
1] C. Y. Yin anclassification algorithm,” M
Zhang, J. W. Xidata security bas
on Practice & ExpW. Xi and R. X. SuValue Method in
mation Systems, ppnd L. Sweeney,
osing informatioMOD-SIGART Sym
ork, 1998, pp. 188ng, J. Y. Li, A. Wk-anonymity mode 12th ACM SIGKning, ACM, New Y
C. Li and S. Veonymity and l-diva Engineering, IEand L. Getoor, in graph data,” in Security, and Tru
R. Motwani, S. UProc. of the 24th I
ro, C. Soriente, Ge time of twitter,”IEEE, Piscatawayrd, A. Rice, N. Skplication functionMobile Computinp. 49–54. Liu, P. Pesti and Tbile environment
n World Wide Webm data privacy to ld. Proc. of the 33rork, 2007, pp. 142
Differential privac-SIAM Symp., on DMathematics, ACM
Sheng, K. Y. Tr learning for or
orks and Learning
Sheng, Z. Wang, v-support vector r5.
V. S. Sheng, “A tor classification,Systems, vol. 28, . Sun and V. S. EE Transactions opp. 1646–1656, 2
nd T. Yu, “MiningProc. of the 19th Ad Data Mining (SI
Naughton and J.ng,” in Proc. of to, Italy, 2013, pp
and A. Schuster, “ACM SIGKDD In(SIGKDD), ACM,. J. Zhang, M.
Utilizing sparse r10th Annual ACM
M, New York, 2011M. Wang, and X. Fnt Pattern Under D
Development, vond J. W. Xi, “M
in cloud compuMultimedia Tools &
i and J. Wang, “sed on clustering perience, vol. 29, un, “Location Priv
n Augmented Reap. 1–7, 2017. “Generalizing daon,” in Proc.mp., on Principle8–202.
W. C. Fu and K. Wdel for privacy-prKDD Int’l Conf., oYork, 2006, pp. 7enkatasubramaniaversity,” in Proc.EEE, Piscataway,
“Preserving thProc. of the 1st A
ust in KDD, ACM
U. Nabar and Y. XuInt’l Conf., on Da
G. Tsudik and A. W in Proc. of IEEE
y, 2012, pp. 285–2kehin and R. Sohanality on smartphong Systems and A
T. Wang, “Supports with privacy grb, ACM, New Yorlocation privacy: rd Int’l Conf., on 29–1430. cy in new settingDiscrete Algorith
M, New York, 201ay, W. Romano
rdinal regression,g Systems, vol. 2
D. Ho, S. Osmanegression,” Neura
robust regulariza” IEEE Transactino. 5, pp. 1241–Sheng, “Structu
on Neural Networ2017. g frequent graph ACM SIGKDD In
SIGKDD), ACM, C
. Y. Cai, “On difthe 39th Conf o
p. 1087–1098. “Data mining withInt’l Conf., on Kn
M, Washington, USWinslett and Y
representation in M Workshop on P1, pp. 177–182. F. Meng, “An AccDifferential Privacol. 51, no. 1, pp. 1
Maximum entropyuting using imp& Applications, p
An improved anoalgorithm,” Concno. 7, 2017.
vacy Protection Bality on Mobile D
ata to provide ano of the 7th es of Database S
Wang, “(a, k)-Anoreserving data pubon Knowledge Di54–759. an, “t-closeness: . of the 23rd IEE2007, pp. 106–11
he privacy of sACM SIGKDD WM, New York, 20
u, “Link privacy iata Engineering, 2
Williams, “HummE Symposium on S299. an, “MockDroid: ones,” in Proc. of Applications, ACM
rting anonymous rid,” in Proc. of trk, 2008, pp. 237Models and algor Very Large Data
gs,” in Proc. of thms Society for In10, pp. 174–183.and S. Li, “Incr” IEEE Transact
26, no. 7, pp. 140
n and S. Li, “Incral Networks, vol.
ation path algoriions on Neural N1248, 2017.
ural Minimax Prorks and Learning
patterns with diffnt’l Conf., on KnoChicago, USA, 20
fferential private fof Very Large D
h differential privnowledge DiscovSA, 2010, pp. 493Y. Yang, “Comp
differential privaPrivacy in the Ele
curate Method forcy,” Journal of Co04-114, 2014.
y model for mobproved informatiopp. 1-17, 2016.
onymity currency
Based on Devices,”
onymity ACM
Systems,
onymity: blishing,” iscovery
Privacy EE Int’l 15. sensitive
Workshop 007, pp.
in social 2008, pp.
mingbird: Security
Trading f the 12th M, New
location the 17th –246. rithms,” a Bases,
the 21st ndustrial
remental tions on 03-1416,
remental . 67, pp.
ithm for Networks
obability Systems,
fferential owledge 013, pp.
frequent Database
vacy,” in very and 3–502. pressive acy,” in ectronic
r Mining omputer
bile text on gain
HInfocoauHis sensis a
Chincryp
Unialgowireand
He is a Professormation Scienuthored more current resea
sor networkingmember of A
na. His curptography, big
iversity. His orithm design,eless ad hoc a
d ACM.
Chunyongfrom ShanChina in 1Ph.D. degGuizhou U2008, respresearch aNew Brun2012.
sor and Dean nce & Techno
than twenty arch interests g, machine lea
ACM.
Jinwen Xicomputer Huaiyin In2011.
Now hedegree itechnologyInformatio
rrent researcg data security
Ruxia Suelectrical eUniversityShe is a
Nanjing Science & research insystems, mathemati
Jin Wandegree froand Telecand 2005, degree froin 2010.
He is aInformatio
research inte, performance
and sensor netw
g Yin receivendong Univers1998. He recegrees in compUniversity, Chpectively. He wassociate at tnswick, Cana
with the Nanology, China. H
journal and cinclude privaarning and net
i received his science and
nstitute of Tech
e is studyingin computery from Nanjion Science ch interests y and privacy.
un received thengineering fr
y of Technologan associate
University Technology, nterests inclu
machine ical modeling.
ng received thom Nanjing Uommunicationrespectively. m Kyung Hee
a professor ion Engineererests mainlyevaluation an
works. He is a
ed the B.S. deity of Technol
eived the M.S.puter science hina, in 2005was a post-docthe Universityada, in 2011
njing UniversitHe has authoreconference papcy preservingtwork security
bachelor degrtechnology
hnology, Chin
g for his masr science ing Universitand Technolare in app
he B.E. degrerom the Shandgy, China, in 1
Professor inof Informa
China. Her cude cyber phy
learning .
he B.S. and MUniversity of Pns, China in 2He received P
e University K
n the Collegring, Yangz
y include round optimizationa Member of I
8
egree logy, . and from and
ctoral y of and
ty of ed or pers.
g and y. He
ree in from
na, in
ster’s and
ty of logy, plied
ee in dong 1997. n the ation
urrent ysical
and
M.S. Posts 2002
Ph.D. Korea
ge of zhou uting n for IEEE