Time-Series Pattern Based Effective Noise Generation for Privacy

Time-Series Pattern Based Effective NoiseGeneration for Privacy Protection on Cloud

Gaofeng Zhang, Xiao Liu, and Yun Yang

Abstract—Cloud computing is proposed as an open and promising computing paradigm where customers can deploy and utilize IT

services in a pay-as-you-go fashion while saving huge capital investment in their own IT infrastructure. Due to the openness and

virtualization, various malicious service providers may exist in these cloud environments, and some of them may record service data

from a customer and then collectively deduce the customer’s private information without permission. Therefore, from the perspective of

cloud customers, it is essential to take certain technical actions to protect their privacy at client side. Noise obfuscation is an effective

approach in this regard by utilizing noise data. For instance, noise service requests can be generated and injected into real customer

service requests so that malicious service providers would not be able to distinguish which requests are real ones if these requests’

occurrence probabilities are about the same, and consequently related customer privacy can be protected. Currently, existing

representative noise generation strategies have not considered possible fluctuations of occurrence probabilities. In this case, the

probability fluctuation could not be concealed by existing noise generation strategies, and it is a serious risk for the customer’s privacy.

To address this probability fluctuation privacy risk, we systematically develop a novel time-series pattern based noise generation

strategy for privacy protection on cloud. First, we analyze this privacy risk and present a novel cluster based algorithm to generate time

intervals dynamically. Then, based on these time intervals, we investigate corresponding probability fluctuations and propose a novel

time-series pattern based forecasting algorithm. Lastly, based on the forecasting algorithm, our novel noise generation strategy can be

presented to withstand the probability fluctuation privacy risk. The simulation evaluation demonstrates that our strategy can

significantly improve the effectiveness of such cloud privacy protection to withstand the probability fluctuation privacy risk.

Index Terms—Cloud computing, privacy protection, noise obfuscation, noise generation, time-series pattern, cluster

Ç

1 INTRODUCTION

CLOUD computing is positioning itself as a new andhopeful platform for delivering information infrastruc-

tures and resources as IT services [1]. For example, cloudcustomers can access these services to execute their tasks ina pay-as-you-go fashion while saving huge capital invest-ment in their own IT infrastructure [2]. However, these cus-tomers often have concerns about whether their privateinformation can be protected when facilitating IT serviceson cloud since they do not have much control inside thecloud [3]. Without proper privacy protection, customersmay eventually lose the confidence in and desire to deploycloud computing in practice [4]. Therefore, privacy protec-tion is a critical issue in cloud computing.

On cloud, there are many organizations, which operateunder various privacy-related regulations and policies forprotecting their customers’ privacy. Meanwhile, a large

number of unknown and malicious service providers mayexist in open and virtualized cloud environments duringthe rapid development of cloud. Such service providersmay collect service information from cloud customers toanalyze and deduce customers’ privacy without permissionor authorization.

For service providers, it is a common phenomenon to col-lect their customers’ information, like service requests. Fromlarge to small firms, they commonly use them to analyze cus-tomers’ behaviors, habits, and other sensitive information[5]. Most ethical ones have adequate self-control to use theinformation conforming to privacy-related regulations andpolicies, but some others may abuse this information inunethical ways. Besides, these features also make customersdifficult to distinguish which service providers are trustwor-thy (ethical or unethical). Existing representative privacyprotection approaches at server side have not taken this situ-ation into a thorough consideration. For such type of cloudprivacy risks, it is natural that customer privacy should beprotected by taking certain technical actions automatically atclient side, without involvement of service providers. Pri-vacy protection at client side [3], [6] is an open issue on cloud.

In this regard, noise obfuscation is a promising approachto protect customer privacy at client side in cloud comput-ing. For example, it generates and injects some noise servicerequests into real customer service requests automatically.To some extent, these noise requests are extra requests just‘like’ real ones, generated by noise obfuscation rather thancustomers’ actual operations. After noise obfuscation, noiserequests and real ones make up final service requests. Whenthese final requests’ occurrence probabilities are about the

� G. Zhang is with the School of Software and Electrical Engineering, Swin-burne University of Technology, Hawthorn, VIC 3122, Australia.E-mail: [email protected].

� X. Liu is with the Shanghai Key Laboratory of Trustworthy Computing,Software Engineering Institute, Eastern China Normal University, Shang-hai 200241, China. E-mail: [email protected].

� Y. Yang is with the School of Computer Science and Technology, AnhuiUniversity, Hefei, Anhui 230039, China, and the School of Software andElectrical Engineering, Swinburne University of Technology, Hawthorn,VIC 3122, Australia. E-mail: [email protected].

Manuscript received 4 Apr. 2013; revised 14 Nov. 2013; accepted 29 Dec.2013. Date of publication 8 Jan. 2014; date of current version 8 Apr. 2015.Recommended for acceptance by K. Li.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TC.2014.2298013

1456 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 5, MAY 2015

0018-9340� 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

same, service providers cannot distinguish which requestsare real ones with adequate confidence. The key advantageis that this approach does not (and should not) need anycooperation or assistance from service providers. Besides,compared to other approaches [7], [8] for privacy protec-tion at client side, noise obfuscation also is flexible fordeployment in terms of the pay-as-you-go style of cloud.Specifically, customers can control and balance the noiseobfuscation function based on privacy protection require-ments and budget (cost limits) to match different situationson the cloud. Hence, it is suitable and promising for cloudcustomers to protect their privacy.

Currently, a historical probability based noise generationstrategy (HPNGS) has been proposed to reduce the cost ofnoise obfuscation on cloud [9]. Compared to conventionalrandom noise generation [10], HPNGS generates noiserequests based on historical probabilities, and final requestsincluding noise ones and real ones can reach about thesame occurrence probabilities, with far fewer noise requests.Under the pay-as-you-go style of cloud computing, fewernoise requests mean less cost, even lower energy-consump-tion [11].

In reality, due to the dynamics of cloud computing,occurrence probabilities of real service requests may havesome fluctuations. However, the existing strategies, includ-ing HPNGS, have not taken these fluctuations into accountthoroughly for noise service requests’ generation. In otherwords, they do not use time intervals in an entire timeperiod to investigate these probabilities for noise obfusca-tion. For instance, HPNGS can reach about the same proba-bilities of final service requests in the entire time period, butmay not be the same case at all time intervals, for probabil-ity fluctuations of real service requests. After that, maliciousservice providers can still find out probability fluctuationsof final requests and are able to deduce customer privacyfrom these fluctuations. Hence, this is a serious privacy risk.Besides, random noise generation [10] does not considerthis privacy risk too.

In this paper, we use ‘time interval’ to denote the timeperiod which counts occurrence probabilities of servicerequests. Several consequent time intervals can make upone ‘time segment’, which can express the fluctuation ofoccurrence probabilities. Time element is the smallest timeunit to make up time interval. The formatted detail aboutthese time units will be presented in Sections 3 and 4.3.

To address this privacy risk, we aim to make occurrenceprobabilities of final requests to be about the same at everytime interval. Hence, we develop a novel time-series pat-tern based noise generation strategy (TPNGS) for privacyprotection on cloud. In this strategy, at first, we analyzethe probability fluctuation privacy risk from the perspec-tive of time intervals, and discuss time interval generationby cluster. At each time interval, occurrence probabilitiesof service requests can be counted and probability fluctua-tions can be expressed by these probabilities at a series ofconsequent time intervals as a time segment. Then, westudy past occurrence probabilities of real requests at thesetime intervals, and deduce some time segments as time-series patterns by time-series segmentation. Based on thesetime-series patterns, we analyze current occurrence proba-bilities of real requests and forecast ‘future’ probabilities of

real requests with pattern matching. At last, based on theforecast results, we generate noise service requests to pro-tect customer privacy by concealing ‘future’ probabilityfluctuations. In other words, these noise requests can makefinal requests to reach the goal that all occurrence probabil-ities are stably kept about the same, even at some timeintervals with probability fluctuations. Besides, consideringthis equality state at any time interval, the overall finaloccurrence probabilities are in the equality state for theentire time period, due to the accumulation feature of theseprobabilities.

Let us take a weather service on cloud as a motivatingexample. One customer, who often travels to one city inAustralia, like ‘Sydney’, checks the weather report regularlyfrom a weather service on cloud before departure. The fre-quent appearance of service requests about the weatherreport for ‘Sydney’ can reveal the privacy that the customerusually goes to ‘Sydney’. But if a system automaticallyinjects other requests like ‘Perth’ or ‘Darwin’ into the‘Sydney’ queue, the service provider cannot distinguishwhich ones are real and which ones are ‘noise’. Theserequests should be responded and hard to reveal the loca-tion privacy of the customer. In such cases, the ‘Sydney’ pri-vacy can be protected by noise obfuscation in general. Oneapproach for improving this process is to decrease noiserequests as in [9], under the pay-as-you-go style of cloudcomputing. But, given the privacy risk identified in thispaper earlier, the customer could go to ‘Sydney’ in thismonth and ‘Perth’ in the next month. So, probabilities ofreal requests may have some fluctuations: ‘Sydney’ requestis high in this month and low in the next month; ‘Perth’request is low in this month and high in the next month. Inthe view of an entire service period, occurrence probabilitiesof ‘Sydney’ and ‘Perth’ may be about the same already fornoise obfuscation and privacy protection. But in the view oftime intervals, customer privacy can still be deduced,because these unconcealed fluctuations can reveal that theperson goes to ‘Sydney’ in this month and ‘Perth’ in thenext month. To address this, the goal of privacy protectionin this paper is to keep occurrence probabilities of finalrequests to be about the same at any time intervals, insteadof only in the entire time period. To achieve this goal, wewill forecast these fluctuations by time-series patterns andgenerate noise service requests. In this example, privacy isthe location information in service requests, not actualrequests. We consider customer privacy without specificdata structures as a general case for noise obfuscation inthis paper, and we can extend this paper’s work into otherprivacy types.

In this example, a time interval is one month, and a timesegment could be four months with a whole probabilityfluctuation. Besides, a time element could be one day whichdenotes the minimum time unit used in this example.

According to the previous discussions, how to generatetime intervals is a crucial issue in this paper, due to that theycan decide the expressions of probability fluctuations. In themotivating example, the issue is why the time intervals are‘this month’ and ‘the next month’. For privacy attackers,time intervals are a mechanism to view probability fluctua-tions introduced before by controlling the expression ofthem. For instance, in the motivating example, if the length

ZHANG ET AL.: TIME-SERIES PATTERN BASED EFFECTIVE NOISE GENERATION FOR PRIVACY PROTECTION ON CLOUD 1457

of time intervals is two months, these probability fluctua-tions about: ‘Sydney’ is high in this month and low in thenext month, ‘Perth’ is low in this month and high in the nextmonth, may not be expressed: they are about the same in thetwo-month period. Hence, for privacy protection at clientside, noise obfuscation has to consider these time intervalswhich are utilized by privacy attackers at server side. Briefly,time intervals that are too long may cause protection failing,whilst too short may cause unnecessary cost. In Section 3, wewill discuss this in detail. And in the preliminary version ofthis paper [12], we only analyzed fixed and pre-set timeintervals to illustrate probability fluctuations. Hence, in thispaper, we will extend these time intervals to be flexible andgenerate them to withstand various privacy attackers fornoise obfuscation and privacy protection.

In summary, the contributions of this paper are:

1. We first investigate the fluctuations of occurrenceprobabilities which can jeopardize existing noiseobfuscation and threat customer privacy, as the prob-ability fluctuation privacy risk introduced before.

2. With the novel privacy risk model addressed in Sec-tion 3, we analyze the withstanding between privacyattackers and privacy protectors (PP) in terms oftime intervals and probability fluctuations for noiseobfuscation.

3. The novel dynamic cluster based time interval gener-ation algorithm (CTIG) for noise generation is pre-sented to provide dynamic time intervals forprobability forecasting and noise obfuscation.

4. Based on time intervals, the novel time-series patternbased forecasting algorithm (TPF) is proposed toabstract past probability fluctuations, and forecastfuture occurrence probabilities (probabilityfluctuations).

5. Our novel time-series pattern based noise generationstrategy is presented to improve the effectiveness ofprivacy protection on noise obfuscation to withstandthe probability fluctuation privacy risk based on theabove model and algorithms in cloud environments.

Besides, noise obfuscation, including our novel noisegeneration strategy, can be utilised in other Internet-basedor distributed systems, not only cloud. However, as intro-duced before, noise obfuscation (e.g., TPNGS) can provide aunique privacy protection approach to match the open, vir-tualized and pay-as-you-go cloud by balancing privacy pro-tection and cost, especially under the probability fluctuationprivacy risk.

A preliminary version of this paper [12] has consideredthe probability fluctuations. In this paper, we mainly extendnoise generation in terms of time interval generation, fromfixed time intervals to dynamic time intervals, compared to[12]. In other words, it is a generalization process. Specifi-cally, 1) in this paper, we first discuss the privacy risk modelwhich focuses on the withstanding about time intervalsbetween privacy attackers and privacy protectors. 2) In thispaper, the novel CTIG is designed to provide a dynamictime interval generation to withstand various privacyattackers in terms of noise obfuscation. 3) In this paper, wesimulate this dynamic time interval generation to supportTPNGS in a much deeper view, compared to [12].

Currently, distributed denial-of-service attack (DDoS)has become one very serious attack for Internet services.Our noise requests in noise obfuscation could not beviewed as a DDoS attack for two main reasons: 1) thenumber of our noise request is much less than a commonDDoS attack which normally has millions of requests [13];2) our noise requests are located at a high level of serviceprocess, which is different from most DDoS’s requests at alow Internet level, such as the ACK message in a TCP 3-way handshake [13].

The remainder of the paper is organized as follows. InSection 2, we overview the related work. Our privacy riskmodel is introduced and analyzed in Section 3. Then, in Sec-tion 4 we discuss some foundations for TPNGS. In Section 5,we present our novel time-series pattern based noise gener-ation strategy. In Section 6, we perform a simulation to dem-onstrate that TPNGS can improve the effectiveness ofprivacy protection significantly to withstand the probabilityfluctuation privacy risk. Finally, in Section 7, we concludeour contributions and point out future work.

2 RELATED WORK

In this section, we overview some typical privacy protec-tion approaches: such as privacy-preserving data mining(PPDM), privacy-preserving data publish (PPDP), privacyinformation retrieval (PIR), proxy and anonymity network,cryptograph for multiple computation and noise obfusca-tion. Besides, time-series pattern and clustering algorithmare effective tools to support our TPNGS.

Many and more researchers are starting to produce and/or have produced remarkable research on privacy protec-tion related to cloud. Public auditability authentication oncloud requires a higher standard of preserving privacy bydata provable secure storage [8]. Similarly, data verificationin cloud has to be emphasized in terms of data provability[14]. These papers express that there are various privacyprotection situations on cloud which should be consideredby various specific privacy protection approaches. In therest of this section, we discuss some typical and widelyused approaches.

PPDM reveals a kind of privacy leakage in the minu-tiae [15]. To protect customer privacy, Evfimievski et al.[16] use a randomization operator to investigate the pro-cess of association rule mining in terms of privacy protec-tion. In contrast, the effectiveness of perturbation [17] hasbeen analyzed deeply in privacy-preserving data miningby maximum a posteriori. Similarly, privacy-preservingdata publish (PPDP) has a widely utilized field in datapublish of service web [18]. In the case with considering atrade-off between privacy and utility [19], PPDP can beenhanced to match the pay-as-you-go style of cloudcomputing.

Different from PPDM and PPDP, PIR utilizes anotherapproach to protect privacy, which mainly prevents data-base operators from knowing users’ sensitive records.Chor et al. [20] conclude that, to get a perfect protection, auser has to query all the entries in database when dealingwith a single server framework. Based on informationtheory, e-commerce considers practical PIR to enhancebusiness processes [21]. Besides, the work on differential


privacy [22] is a promising approach to protect customerprivacy by pursuing anonymity.

Proxy and anonymity network to protect customers’ pri-vacy have been widely discussed. The major goal is to keepanonymity or ‘invisibility’ in a complex or ‘dangerous’ net-work condition. For example, TOR [23] provides a sophisti-cated privacy protection method, making it difficult forattackers to trace the customer via network traffic analysis.In social network [24] and encrypted communication [25],anonymous network focuses on identity anonymity by con-cealing related characteristics.

As analyzed in Section 1, various malicious service pro-viders may exist in cloud environments. Some of them mayrecord customers’ service requests and collectively deducecustomers’ private information. Therefore, customers’ pri-vacy needs to be protected without involving service pro-viders. This is the basic scenario in this paper.

Briefly, PPDM is not an ideal choice to address this sce-nario because it is out of customers’ control. PIR and PPDPmainly work at server side, hence have the similar problem.Proxies and anonymity networks need service provider’scooperation to enable such access, and have to face a possi-bility that cannot enable this access in complex cloud envi-ronments. Therefore, they are not suitable for the scenarioin this paper.

Currently, in privacy protection at client side, some cryp-tograph methods [7], [8] have been discussed in multi-partysecure computation. But they have to face a low efficiencyto be utilized actually in cloud computing, although theyhave a strong mathematical basis. Besides, the support fromservice providers is necessary for this approach.

Noise obfuscation is another widely adopted approachfor protecting privacy. For example, the location privacyprotection [26] in a mobile environment is discussed by pre-senting a solution based on different obfuscation operators.It is clear that noise obfuscation can be utilized by cloud cus-tomers to keep their privacy safe on their own automaticallywithout cloud service providers. Specifically, Ye et al. [10]investigate noise injection in privacy-aware searching byformulating noise injection as a mutual information minimi-zation problem. Similarly, range query and KNN queryservices in the cloud can be enhanced by RASP data pertur-bation [27]. And a common model is presented in terms ofobfuscation-based private web search [28]. Zhang et al. [9]present a historical probability based noise generation strat-egy for privacy protection in cloud computing to obtain apromising cost-saving in cloud environments. Besides, asimilar idea about packet padding [25] focuses on conceal-ing fingerprints of web pages. But they do not considerprobability fluctuations introduced before, that is a short-coming in terms of privacy protection. This is what we planto address by our novel TPNGS. In [12], a preliminary ver-sion of this paper considers these probability fluctuations toimprove noise obfuscation. But it only considers fixed andpre-set time intervals as a special case in time-series patternbased noise generation. And we will generalize this in thispaper.

In the scenario discussed in this paper, noise obfuscationis utilized by cloud customers at client side, which is differ-ent from other existing privacy protection approaches atserver side. And the efficiency of our approach is different

from that at server side too. Hence, we evaluate the cost ofnoise obfuscation to discuss the efficiency among noiseobfuscation strategies at client side, which is an importantaspect of the evaluation in Section 6.

About time-series pattern, a time-series pattern basedalgorithm [29] is presented to forecast duration intervals inscientific workflow activities. Considering the problem inthis paper, the time-series pattern is an effective tool to fore-cast ‘future’ occurrence probabilities based on past occur-rence probabilities in the situation with probabilityfluctuations.

Besides, clustering algorithm is a suitable tool to organizedata to obtain inherent relations [30], and can be utilized indynamic time interval generation as a key part of TPNGS.In this paper, a clustering algorithm generates dynamictime intervals to match various privacy attackers. Based onthese time intervals, time-series patterns can be deducedfrom past occurrence probabilities. And jointly with currentoccurrence probabilities, we can forecast ‘future’ occurrenceprobabilities to guide noise generation. Hence, probabilityfluctuations can be foreseen and addressed by noiseobfuscation.

3 PRIVACY RISK MODEL

In this section, we discuss privacy risk models to analyzethe withstanding between privacy attackers (PAs) and pri-vacy protectors about time intervals and probability fluctua-tions in terms of noise obfuscation. In other words, as athorough investigation of the probability fluctuation pri-vacy risk compared to [12], we present the privacy riskmodels. It is the basis of our novel TPNGS.

As introduced in Section 1, under the probability fluctu-ation privacy risk, how to obtain these fluctuations ofoccurrence probabilities is the key issue considered by PAsat server side, and PPs at client side too. Time intervals arethe time periods to count occurrence probabilities whichcould make up a whole probability fluctuation. Hence,time intervals can dominate the expression of probabilityfluctuations. In other words, different time intervals canexpress different probability fluctuations, even on thesame service data. In the former motivating example inSection 1, time intervals with ‘this month’ and ‘the nextmonth’ can express a probability fluctuation, but a timeinterval with the two months cannot express any probabil-ity fluctuations. Besides, obtaining time intervals is thecommon problem of PAs and PPs: PAs utilize time inter-vals to find out probability fluctuations and obtain cus-tomer privacy; PPs utilize time intervals to guide noiseobfuscation and keep customer privacy safe by withstand-ing these PAs. Hence, the key of privacy risk models istime interval generation.

As introduced in Section 1, we have three time periods inthis paper: time elements: TE ¼ fh1; 2ih2; 3i � � � hT 0; T 0 þ 1ig;TEðiÞ ¼ hi; iþ 1i; i 2 ½1; T 0�, time intervals: TI ¼ fht1; t2iht2; t3i � � � htT ; tTþ1ig; TIðiÞ ¼ hti; tiþ1i; i 2 ½1; T � and timesegments: TS ¼ fht01; t02iht02; t03i � � � ht0T 00 ; t0T 00þ1ig; TSðiÞ ¼ ht0i;t0iþ1i; i 2 ½1; T 0� as three sets with 2-tuples. Besides, time ele-ments form time intervals, and time intervals form time seg-ments. As discussed in Section 1, if PPs utilize longer timeintervals than PAs, the probability fluctuation risk may


jeopardize privacy protection. And if they are shorter ones,PPs with noise obfuscation may generate more noise (morecost) with a similar effectiveness of privacy protection. Wewill illustrate this in Section 6.

For PAs, there are two main types of time intervals: equaland unequal, and two main types of time interval genera-tion: static and dynamic. Equal time intervals mean that alltime intervals have the same length, such as one month forall. And unequal time intervals mean that time intervalshave different lengths, such as three months, half a monthand so on. Static time intervals mean that time intervals aregenerated by pre-setting without considering runtime ser-vice data. And dynamic time intervals mean that time inter-vals are generated dynamically, depended on runtimeservice data. Therefore, for PAs, there are four types ofmethods for time interval generation: 1) ES—equal timeintervals with static generation, 2) ED—equal time intervalswith dynamic generation, 3) US—unequal time intervalswith static generation and 4) UD—unequal timeintervals with dynamic generation. In other words, thereare four kinds of PAs. Therefore, to withstand these PAs,PPs have to consider them all for noise obfuscation.

In Fig. 1, we introduce privacy risk models to discuss thewithstanding between PPs and PAs in terms of time inter-vals for noise obfuscation, under the probability fluctuationprivacy risk. In other words, in time interval generation fornoise obfuscation and privacy protection, PPs can utilizesome methods to withstand PAs’ methods.

3.1 Novel Privacy Risk Model and Its Specific Case

In the cloud, PPs are impossible to settle down PAs’ statictime intervals, or specific methods of dynamic generation.Hence, PPs have to analyze time intervals by themselves.In other words, they need to investigate past occurrenceprobabilities, and generate time intervals from them. It is

reasonable that these time intervals should express proba-bility fluctuations sufficiently. As described by Model 1 inFig. 1, this idea can utilize the dynamic generation to with-stand PAs’ static and dynamic generations. Specifically,we will introduce a dynamic time interval generation algo-rithm (CTIG) in Section 4.2, which is a fundamental part ofour novel TPNGS. Based on the results of CTIG, PPs canutilize time intervals to describe probability fluctuations,and support time-series pattern forecasting to guide noisegeneration. Briefly, Model 1 reflects our novel privacy riskmodel and the primary task of this paper. Hence, weobtain Model 1 in which SPA is the set of privacy attackers:{ES, ED, US, UD} and SPP is the similar set of privacy pro-tectors: {ED, UD},

fES;ED;US; UDgzfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{Model1:SPA

map1��!fED;UDgzfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{SPP

: (1)

3Besides, compared to Model 1, Model 2 is a specificand idealized case which requires that: PPs know PAs’static time intervals, or specific methods of dynamic gen-eration. As described by Model 2 in equation (2), basedon PAs’ time intervals, PPs can ‘accurately’ utilize themto describe probability fluctuations directly, and supporttime-series pattern forecasting to guide noise generation.And the withstanding is simple, compared to Model 1.Actually, this model has been mainly utilized in [12] asthe preliminary work of this paper

fES;ED; US; UDgzfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{Model2:SPA

map2��!fES;ED;US; UDgzfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{SPP

: (2)

3.2 Other Possible Privacy Risk Models

Compared to Models 1 and 2, there are still some othermodels to describe other possible withstanding betweenPPs and PAs in terms of time intervals, such as Models 3and 4 in Fig. 2. It is obvious that Models 1, 2, 3, and 4 candelegate all reasonable possibilities for PPs in terms of timeinterval generation and noise obfuscation. But these twomodels are not suitable to be utilized: Model 3 means thatPPs utilize static generation. Hence, they need PAs’ statictime intervals. But as discussed before, PPs hardly knowPAs’ preset time intervals in real cloud environments;Model 4 means that time interval generation has to focus ononly one type of time intervals: equal or unequal, which isnot comprehensive for delegating PAs in terms of timeinterval generation. In equations (3) and (4), Models 3 and 4can be described

fES;ED;US; UDgzfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{Model3:SPA

map3��!fUS;UDgfES;EDg

zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{SPP

;(3)

fES;ED;US;UDgzfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{Model4:SPA

map3��!fUDgfEDg

zfflfflffl}|fflfflffl{SPP

:(4)

Fig. 1. Privacy risk models in terms of time interval.


In real cloud environments, PPs are difficult to recognizePAs’ actions. Hence, one noise obfuscation strategy has toface various possible PAs with different methods of timeinterval generation discussed before, concurrently. Specifi-cally, based on Model 1, we will evaluate our novel strategyand design corresponding processes to illustrate variouscases: four types of PAs with four methods for time intervalgeneration: 1) ES—equal time intervals with static genera-tion, 2) ED—equal time intervals with dynamic generation,3) US—unequal time intervals with static generation and 4)UD—unequal time intervals with dynamic generation,which will be illustrated one by one in Section 6.

4 FOUNDATION FOR TIME-SERIES PATTERN

BASED NOISE GENERATION STRATEGY

To support TPNGS, we introduce some foundations in thissection. At first, based on existing noise injection models,we propose our time-series pattern based noise injectionmodel. Then, based on our privacy risk model introducedin Section 3, we present our novel CTIG to generate timeintervals. Lastly, based on these time intervals, we investi-gate time-series patterns for forecasting probability fluctua-tions, and present our novel time-series pattern basedforecasting algorithm (TPF).

4.1 Noise Injection Model

Our time-series pattern based noise injection model is modi-fied from [9] to fulfill our time-series pattern idea as shownin Fig. 3.

QR: queue of customer’s real service requests to beprotected.

QN : queue of noise service requests to be injected inQR.

QS : queue of final service requests composing of QR andQN .

Q: a set of service requests, and Q ¼ fq1; q2; . . . . . . ;qi; . . . . . . ; qng. Every service request in QR;QS andQN is from this set. Hence, in the view of service pro-viders, one request in the queue of final servicerequest QS could be from real requests QR or noiserequests QN .

": probability for injecting QN into QS , and " 2 ½0; 1�. Wecall it noise injection intensity.

The overall working process of the model is to inject QN

into QR based on " so that we can get QS . The function ofnoise generation to obtain QN will be processed when QR isnot empty. And every request in QR will be injected in QS

companying with requests from QN . Besides, supposed thatqi is an item from Q;P ðQR ¼ qiÞðtÞ; P ðQN ¼ qiÞðtÞ andP ðQS ¼ qiÞðtÞ are occurrence probabilities of qi in QR;QN

and QS at time t, respectively.As introduced in Section 1, to protect customers’ privacy

under the probability fluctuation privacy risk, we need toachieve the state that 8i; all P ðQS ¼ qiÞðtþ 1Þ are about thesame. Therefore, if we forecast that P ðQR ¼ qiÞðtþ 1Þ has ahigh value by our strategy, P ðQN ¼ qiÞðtþ 1Þ needs to havea low value to achieve that state, and vice versa. This is thegeneral process of generating noise requests based on time-series patterns.

Besides, in this model, to obtain noise requests, noisegeneration probabilities and noise injection intensity arenecessary to be analyzed. Hence, based on this model,we will discuss them in Section 5.1 to present our novelTPNGS.

4.2 Cluster Based Time Interval GenerationAlgorithm

In this section, we introduce our novel CTIG for time-seriespattern forecasting by obtaining time intervals dynamically,based on [30]. As introduced in Section 3, CTIG focuses onhow to obtain time intervals to express occurrence probabil-ity fluctuations sufficiently.

As introduced in Section 3, there are two main types oftime intervals: one is equal, and another one is unequal.Hence, our CTIG algorithm has to consider both of them.Briefly, our algorithm primarily analyzes the unequal typeof time intervals, and considers the equal type of time inter-vals as a special case. So, we can generate both unequal andequal time intervals as needed by CTIG. Besides, in thisalgorithm, we analyze time-series data in terms of time ele-ment, and obtain time-series data in terms of time intervalto provide input data to our novel time–series pattern basedforecasting algorithm (TPF) described in Section 4.3.

To express probability fluctuations sufficiently, CTIGneeds to maximize the disparity between occurrence proba-bilities at adjacent time intervals. Besides, CTIG utilizes the

Fig. 3. Time-series pattern based noise injection model.

Fig. 2. Privacy risk models 3 and 4.


‘cluster’ idea in a bottom-up approach to obtain time inter-vals from time elements with occurrence probabilities.Hence, for occurrence probabilities at every time element, asmall disparity between these probabilities at adjacent timeelements means that these two time elements have a highpossibility to combine together into a time interval, and viceversa. That is the basic idea of our novel CTIG algorithm.

In Algorithm 1, CTIG clusters time elements to obtaintime intervals with occurrence probabilities. In Stage 1,time intervals can be initialized as time elements. In Stage 2,we generate time intervals by combining time elementsaccording to occurrence probabilities’ disparities. In thisstage, function Obtain min tðÞ is to find out the time inter-val which has the lowest disparity on occurrence probabili-ties between it and its next time interval in the entire timeperiod. It is a traverse function. And functionTIMðmin t; TIðtÞ; 8i; P ðQR ¼ qiÞðtÞÞ is to merge two adja-cent time intervals into one with occurrence probabilities’recomputed. Function OPD() can be utilized to determinewhether the loop in Stage 2 should be terminated or not, bya preset parameter—distance_boundary. In Stage 3, we usefunction ATIE() to adjust time intervals to be equal (or tohave the same length). In the meantime, occurrence proba-bilities can be adjusted along with time interval adjust-ments, too.

Function ATIEð8i; P ðQR ¼ qiÞðtÞ; TIðtÞÞ is to adjust timeintervals to be equal, and we adopt the generic algorithm(GA) to adjust them step by step. Specifically, in this func-tion, ‘long’ time intervals release some margin time ele-ments gradually to reach the average length of existing timeintervals, and adjacent ‘short’ time intervals absorb thesetime elements gradually to reach the average length, too. Inthe releasing and absorbing processes, this function adjustsoccurrence probabilities as little as possible. After these

processes, we can obtain equal time intervals to expressfluctuations of occurrence probabilities sufficiently.

In this algorithm, Stage 1 is a traversing process of allpast probabilities, and its complexity is OðT 0Þ. Stage 2 is aloop function of traversing processes, and the number oflooping depends on specific data. In the worst case, its com-plexity is OðT 02Þ. Stage 3 is a conditional process of resultmodification, its worst complexity is OðT 2Þ. Hence, the com-plexity of CTIG is OðT 2 þ T 02Þ.

Besides, in the execution process of CTIG, we keep a treestructure to store time intervals TI(t) and update them byadjusting this tree in the entire time period. Hence, in noisegeneration processes, CTIG only needs to adjust part of thetree, rather than rebuild the whole tree, without significantcost impact on noise generation processes, in comparison toother existing representative strategies.

Our novel CTIG algorithm can generate time intervalswith occurrence probabilities for our novel TPF algorithmin Section 4.3 as its input. And these time intervalsdynamically express probability fluctuations sufficiently.Besides, in the following parts of this paper, time intervalsare the main ‘time’ view of occurrence probability, and alloccurrence probabilities mean occurrence probabilities attime intervals.

4.3 Time-Series Pattern Based ForecastingAlgorithm

In this section, we present our novel TPF algorithm for noisegeneration. First, we introduce an algorithm for time-seriessegmentation and pattern generation (TSPG). Then, basedon these patterns, we introduce an algorithm for patternmatching and forecasting (PMF). Lastly, to support TPNGS,the TPF algorithm is presented. Besides, all time units inthis section are time intervals as introduced before. Andtime segments, such as time-series patterns, are made up bytime intervals.

Similar to time-series data in other time-series patternbased forecasting algorithms [29], [30], occurrence probabil-ities have the feature of changing with time passing. That isthe common precondition of time-series pattern based algo-rithms. In this paper, occurrence probabilities are composedof various occurrence probabilities of various servicerequests, and each of them can be treated as a single time-series pattern based forecasting process. Therefore, in onetime-series pattern based forecasting process, we executeboth TSPG and PMF algorithms to derive forecastingresults. Then, we combine these processes together and inte-grate these forecasting results to support noise generation.This is the main procedure of our novel time-series patternbased forecasting algorithm (TPF) for noise generation.

4.3.1 TSPG: Time-Series Segmenting and Pattern

Generation Algorithm

Here we introduce TSPG based on [29]. In brief, TSPGdivides the entire time period with past occurrence prob-abilities and gets some time segments by checking thevalidation of them. Specifically, at the beginning, we uti-lize the bottom-up and top-down approaches to movewindows in time-series to make sure that the variance ofprobabilities in one segment is close to, but no more


than a pre-set parameter as a maximum boundary of vari-ance. Then, we split the entire time period with time-series data into several time segments. Lastly, we validatethem and set them as patterns by a pre-set parameter:Min_pattern_length which means the minimum boundaryof a validated pattern’s length. Hence, the input of TSPGis the past occurrence probabilities of real requests:P ðQR ¼ qkÞðtÞ; k 2 ½1; n�; t 2 ½0; T �, and the output is agroup of time segments—Patterns½j�; j 2 ½0; m�. The func-tion of TSPG is Patterns½j�; j 2 ½0; m� ¼ TSPGðP ðQR ¼ qkÞðtÞ; k 2 ½1; n�; t 2 ½0; T �Þ.

In this algorithm, the main process is a traversing func-tion. Its worst complexity is O(T2).

Besides, each validated pattern has an attribute—next-value which is the first probability value of the nextpattern or time segment after this pattern in the entiretime period. It is a key attribute for forecasting in PMFdescribed next.

4.3.2 PMF: Pattern Matching and Forecasting

Algorithm

Here we introduce PMF based on [29], too. In brief, PMFutilizes patterns, resulted from TSPG, to match currentprobabilities. If we find a matched pattern, its forecastingattribute—nextvalue can be utilized to forecast ‘future’probabilities. Besides, Min(abs(Patterns.mean-CP.mean))denotes a function which returns one pattern with a mini-mum absolute difference of probability means between itand CP which denotes the current probabilities queue.Hence, the function of this algorithm can be describeddirectly as MP,FR ¼ PMF ðPatterns½j�; j 2 ½0; m�; CP Þ: i.e.,one input is the patterns that we have got Patterns½j�;j 2 ½0; m�, another input is the current probabilities queueCP; one output is the matched pattern MP, another outputis the forecasting result FR. Our forecasting result FR isthe probability which denotes the future occurrence possi-bility of one real service request, and it is decided by thematched pattern MP.

In this algorithm, the main process is a traversing andmatching process. Its complexity is OðT �mÞ.

Moreover, the PMF algorithm takes the mean of currentprobabilities queue CP as the default value. If we cannotfind out a suitable pattern, the mean is used as the forecast-ing result FR to guide noise generation.

4.3.3 TPF: Time-Series Pattern Based Forecasting

Algorithm

Here we present our novel TPF algorithm for noise genera-tion. In Algorithm 2, we detail TPF for noise generationbased on the TSPG algorithm which can be applied as afunction named TSPGðÞ, and the PMF algorithm which canbe applied as a function named PMF(). We operate them forvarious probabilities of various service requests and derivevarious forecasting results. After that, we normalize theseforecasting results. It is apparent that for a certain timeinterval, the sum of occurrence probabilities of all servicerequests is 1. Besides, we denote L as the length of currentprobabilities queue. In this paper, we set Min_pattern_lengthas its default value.

In this algorithm, we first utilize the time-series segment-ing and pattern generation algorithm (TSPG) and patternmatching and forecasting algorithm (PMF) to execute onesingle time-series pattern based forecasting process corre-sponding to one service request. For each service request,the single process has to be executed independently. Then,we combine these forecast results from these processes, andnormalize them into one group of ‘future’ probabilities fornoise generation. Compared to [29], the TPF algorithm putssignificant efforts into the utilization of forecast results andtheir normalization for noise generation in terms of novelty.

Besides, in this algorithm, based on previous discus-sions, its complexity is Oðn� ðT 2 þ T �mÞ þ nþ nÞ ¼Oðn� T � ðT þmÞÞ.

In the TPF algorithm, the size of time segments and time-series patterns can be controlled by parameter Min_pattern_length in TSPG as described in previous paragraphs. It isobvious that suitableMin_pattern_length can provide preciseforecasting results for noise obfuscation. If the forecasting isinaccurate, noise generation will perform poorly (low effec-tiveness and unnecessary cost). For instance, in [29], therelations between the size of time segments (Min_pattern_length controlling) and the accuracy of forecasting havebeen discussed. Hence, related conclusions can provide aclear picture to support the relations between the size oftime segments and privacy protection performance of noiseobfuscation.

5 NOVEL TIME-SERIES PATTERN BASED NOISE

GENERATION STRATEGY

Based on the previous sections, in this section, we firstanalyze time-series pattern based noise generation aboutnoise generation probabilities and noise injection inten-sity, and then we propose our novel time-series patternbased noise generation strategy for privacy protection oncloud.

5.1 Time-Series Pattern Based Noise Generation

In this section, we introduce two key issues of our noisegeneration strategy—noise generation probabilities andnoise injection intensity. In the process of noise generation,noise generation probabilities determine which kinds ofnoise requests should be generated and the noise injectionintensity decides how many noise requests should be


generated. Besides, all discussions in this section build ontime-series occurrence probabilities at time intervals, by ournovel CTIG algorithm presented in Section 4.2.

5.1.1 Noise Generation Probabilities

Based on [9], we discuss noise generation probabilities inTPNGS. We add parameter time t to denote the time attri-bute (in terms of time interval) in noise generation pro-cesses. Then, to pursue the goal of noise generation, wehave noise generation probabilities in TPNGS:

8i; P ðQN ¼ qiÞðtÞ ¼ MðtÞ � P ðQR ¼ qiÞðtÞn�MðtÞ � 1

: (5)

In equation (5), MðtÞ is that for every i, the largestP ðQR ¼ qiÞðtÞ at time t:

MðtÞ ¼ MAXð8i; P ðQR ¼ qiÞðtÞÞ: (6)

From equations (5) and (6), we should analyze P ðQR ¼qiÞðtÞ which is an important part of noise generation proba-bilities

8i; P ðQR ¼ qiÞðtþ DtÞ ¼ TPFAðP ðQR ¼ qiÞðt0Þ; t0 2 ½1; t�Þ:(7)

In equation (7), TPFAðÞ denotes the function of Algo-rithm 2 about TPF. Hence, equation (7) is a fundamentalpart of this paper: we use past requests’ probabilities toforecast future requests’ probabilities for noise genera-tion by time-series patterns. We set Dt ¼ 1 to deriveequation (8)

8i; P ðQR ¼ qiÞðtÞ ¼ TPFAðP ðQR ¼ qiÞðt0Þ; t0 2 ½1; t� 1�Þ:(8)

Combining equations (5), (6) and (8), we can get finalnoise generation probabilities in TPNGS by equation (9)

8i; P ðQN ¼ qiÞðtÞ

¼ MAXf8j; TPFAðj; t� 1Þg � TPFAði; t� 1Þn�MAXf8j; TPFAðj; t� 1Þg � 1

:

(9)

5.1.2 Noise Injection Intensity

To withstand the probability fluctuation privacy risk inthis paper, we try to get final ‘indistinguishable’ proba-bilities by

8i; 8t; P ðQS ¼ qiÞðtÞ ¼ 1=n: (10)

It is obvious that equation (10) means that occurrenceprobabilities of final services at all time intervals are aboutthe same, which is the goal for privacy protection addressedin this paper. From the noise injection model presented inSection 4.1, we have

8i; P ðQS ¼ qiÞðtÞ ¼ ð1� "ÞP ðQR ¼ qiÞðtÞ þ "P ðQN ¼ qiÞðtÞ:(11)

Combining equations (10) and (11), we can derive noiseinjection intensity " by

"ðtÞ ¼ 1� 1

n�MðtÞ : (12)

To realize equation (12), we have

"ðtÞ ¼ 1

� 1

n�MAXfTPFAð8i; P ðQR ¼ qiÞðt0Þ; t0 2 ½1; t� 1�Þg :

(13)

Equations (9) and (13) enable our whole strategy to reachits goal, i.e., equation (10).

Compared to existing noise generation strategies, suchas HPNGS or random generation, TPNGS enhances thegoal of privacy protection from 8i; P ðQS ¼ qiÞ ¼ 1=n toequation (10). Now it can address the privacy risk identi-fied in Section 1. Besides, it is clear that the goal of TPNGS,i.e., realization of equation (10), is a sufficient condition ofthe goal of existing strategies: 8i; P ðQS ¼ qiÞ ¼ 1=n. So, ifthe occurrence probabilities are about the same at everytime interval, these probabilities will be about the same inthe overall time period.

5.2 Time-Series Pattern Based Noise GenerationStrategy

In this section, we present our novel time-series patternbased noise generation strategy—TPNGS as the key contri-bution of this paper.

In Algorithm 3, compared to existing representative noisegeneration strategies, we can find out that the major


improvements of our novel TPNGS are to use 8i; P ðQR ¼qiÞðtÞ; t 2 ½1; T 0� ¼ CTIGð8i; P ðQR ¼ qiÞðtÞ; t 2 ½1; T �Þ and8i; TPFAðP ðQR ¼ qiÞðt0Þ; t0 2 ½1; T �Þ as data preprocessingand pattern forecasting in Step 2. As stated earlier, theCTIG algorithm is the novel cluster based time intervalgeneration algorithm for times-series pattern forecasting,which focuses on time interval generation to support time-series pattern forecasting. And the TPF algorithm is thenovel time-series pattern forecasting algorithm for noisegeneration, which utilizes time-series patterns to summa-rize past probabilities and forecast ‘future’ probabilities.In this strategy, based on the data collected in Step 1, weexecute the CTIG algorithm and the TPF algorithm conse-quently in Step 2, and utilize the results of the TPF algo-rithm in later steps (Step 3 and Step 4)—computing noisegeneration probabilities and noise injection intensity. InStep 5, noise injection processes can be executed. Briefly,our novel strategy considers probability fluctuations fornoise obfuscation and privacy protection, compared toexisting strategies, like HPNGS or random generation.

In this strategy, the key step is Step 2. Based on previousdiscussion on CTIG and TPF, the complexity of TPNGS isOðT 02 þ n� T � ðT þmÞÞ. By considering the numbers oftime elements, time intervals and time-series patternsðT 0 � T and T � mÞ, the final complexity is OðT 02Þ (n is thesize of real request set, and could be preset or controlledbefore TPNGS’s execution. Hence, it is a constant in thiscomplexity). Compared to existing time-series forecastingalgorithms, such as [29] with the complexity: OðT 0Þ, ournovel CTIG requires extra traversing in the whole executionof TPNGS, which explains the complexity’s quadratic.

Besides, under an extreme case without probability fluc-tuations, it is clear that TPNGS and HPNGS could performsimilarly in noise generation, for there is no need to forecast.

6 EVALUATION

In this section, we present an experimental simulation inour cloud simulation system called SwinCloud [31] (swin-burne cloud simulation environment). Its aim is to simulateour novel TPNGS and demonstrate that TPNGS canimprove the effectiveness of privacy protection by noiseobfuscation significantly under the probability fluctuationprivacy risk, compared to other existing representativenoise generation strategies.

As described in Section 3, cloud computing brings pri-vacy protection and noise obfuscation into much more com-plex environments than ever before. In this section, wedemonstrate simulation processes by focusing on the‘unknown’ privacy attackers’ time intervals for privacy pro-tectors, according to Model 1 in Fig. 1 and equation (1). Andit can be viewed as the generalized case of [12], as discussedin Section 3.

6.1 Simulation Background and Environment

SwinCloud is a cloud computing simulation environment[31]. It is built on the computing facilities in SwinburneUniversity of Technology. The functions of VMWare canoffer unified computing and storage resources. In thisenvironment, we set some nodes to represent a privacyprotector at client side and four privacy attackers at server

side. The privacy protector node produces a queue of realcustomer service requests, and utilizes noise generationstrategies to generate a queue of noise service requests.These service requests can be combined together and sentto all privacy attacker nodes. These four privacy attackernodes execute four different methods for time intervalgeneration as addressed in Section 3 to comprehensivelyanalyze the effectiveness of privacy protection on noiseobfuscation in cloud computing. Besides, , we use our pri-vate cloud environment (SwinCloud) for executing thesimulation to control the entire withstanding process ofprivacy protection.

6.2 Simulation Process

In [12], a preliminary version of TPNGS without CTIG hasbeen simulated and evaluated. In this section, with thenovel CTIG algorithm, we can obtain a generalized simula-tion evaluation on noise obfuscation in terms of time inter-val generation to evaluate updated TPNGS. As discussed inSection 3, there are four types of privacy attackers: {ES, ED,US, UD}, and we can evaluate TPNGS to withstand thesefour one by one.

To demonstrate that TPNGS can withstand the probabil-ity fluctuation privacy risk effectively, the simulation pro-cess is to compute and compare the privacy protectioneffectiveness of TPNGS with that of HPNGS. At the end ofthis section, we will demonstrate the comparison amongTPNGS, HPNGS and random generation.

We use function EPP ðStrategy; tÞ ¼ VARðStrategy; tÞ tomeasure the effectiveness of privacy protection on noiseobfuscation to compare TPNGS and HPNGS

VARðStrategy; tÞ ¼X

8i;qi2QP ðQS ¼ qiÞðtÞ � 1

Qk k� �2

: (14)

From equation (14), VARðStrategy; tÞmeans that the vari-ance of all occurrence probabilities of requests in QS isunder Strategy at time t. It is obvious that a different Strategycan produce different P ðQS ¼ qiÞðtÞ. A low variance of allprobabilities denotes that all occurrence probabilities offinal requests are about the same, and customer privacyis safe as addressed in Section 1. Therefore, the lessEPP ðStrategy; tÞ, the better effectiveness of privacy protec-tion under Strategy at every time t. Besides, the parameter-time t in this section builds on time intervals as time instan-ces, from time (interval) 0 to any time (interval) TA needed.

As discussed in Section 3, we use four privacy attackersin terms of time intervals to obtain EPP one by one, andevaluate the effectiveness of privacy protection on noiseobfuscation, comprehensively.

Before simulation, we generate a service queue as the realservice queue randomly to avoid biased fluctuations andlimited small samples on specific customers. Besides, publicHTTP query data sets in network research, such as UC Ber-keley Home IP Web Traces,1 NASA-HTTP,2 are hard to be sep-arated based on specific customers due to anonymity, and

1. http://ita.ee.lbl.gov/html/contrib/UCB.home-IP-HTTP.html.2. http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html.


the mixture of different customers’ requests are not suitablefor our client-side privacy protection (e.g., TPNGS).

Besides, our request data is executed in a time period.Hence, we discuss related simulation results in terms oftime consequence and general trends, not just at specifictimes based on random request data.

Compared to the preliminary version of this paper [12],CTIG’s execution is the main different part in execution ofentire TPNGS. Especially, in execution of CTIG andTPNGS, as presented in Section 4.2, we keep a tree struc-ture to store time intervals and update them by adjustingthis tree in the entire time period with CTIG and TPNGSexecution. Hence, in noise generation processes, CTIG andTPNGS only need to adjust part of the tree, without signifi-cant cost impact on noise generation processes.

6.3 Simulation Results and Analysis

As introduced in Section 3, there are four different privacyattackers that should be discussed respectively from theperspective of time interval generation. So, there are fourEPP comparisons in: 1) UD—unequal time intervals withdynamic generation, 2) ED—equal time intervalswith dynamic generation, 3) US—unequal time intervalswith static generation and 4) ES—equal time intervalswith static generation. In Fig. 4, Fig. 5, Fig. 6, Fig. 7,TPNGS and HPNGS are compared in terms of the effec-tiveness of privacy protection, respectively. The horizontaland vertical coordinates are time t and EPP, andTA ¼ 5; 000 or t 2 f0; 5; 000g.

In Fig. 4, we can see that with time t passing, both EPP(HPNGS,t) and EPP(TPNGS,t) keep a similar pattern of

fluctuation in different zones. EPP(HPNGS,t) fluctuatesmainly between 2.00E-04 and 1.25E-04, while EPP(TPNGS,t)fluctuates between 1.25E-04 and 5.00E-05. Hence, EPP(TPNGS,t) is about 1/2 of EPP(HPNGS,t) from the figure.Therefore, we can conclude that TPNGS significantlyimproves the effectiveness of privacy protection than exist-ing HPNGSwhen privacy attackers utilize UD.

In Fig. 5, we can see that with time t passing, both EPP(HPNGS,t) and EPP(TPNGS,t) keep a similar pattern of fluc-tuation in different zones. EPP(HPNGS,t) fluctuates mainlybetween 1.75E-04 and 1.00E-04 while EPP(TPNGS,t) fluctu-ates between 6.00E-05 and 4.00E-05. Hence, EPP(TPNGS,t)is about 1/3 of EPP(HPNGS,t) from the figure. Therefore,we can conclude that TPNGS significantly improves theeffectiveness of privacy protection than existing HPNGSwhen privacy attackers utilize ED.

From Figs. 4 and 5, we can see that TPNGS significantlyimproves the effectiveness of privacy protection to with-stand privacy attackers with dynamic time interval genera-tion. In the following Figs. 6 and 7, we discuss the otherside: static time interval generation.

In Fig. 6, we can find out that with time t passing, bothEPP(HPNGS,t) and EPP(TPNGS,t) keep a similar pattern offluctuation in similar zones. They both fluctuate mainlybetween 0.000402 and 0.000398. Therefore, in the case of pri-vacy attackers utilizing US, TPNGS performs similar withexisting HPNGS in terms of the effectiveness of privacyprotection.

In Fig. 7, we can find out that with time t passing, bothEPP(HPNGS,t) and EPP(TPNGS,t) keep a similar pattern offluctuation in similar zones. They both fluctuate mainlybetween 0.000403 and 0.000396. Therefore, in the case of pri-vacy attackers utilizing US, TPNGS performs similar with

Fig. 4. EPP Comparison in UD.

Fig. 5. EPP Comparison in ED.

Fig. 6. EPP Comparison in US.

Fig. 7. EPP Comparison in ES.


existing HPNGS in terms of the effectiveness of privacyprotection.

Generally speaking, in the above four cases, TPNGSimproves the effectiveness of privacy protection signifi-cantly in two cases of time intervals with dynamic genera-tion. In other two cases of time intervals with staticgeneration, TPNGS performs similarly with HPNGS. Ournovel CTIG algorithm is a dynamic algorithm to generatetime intervals, and it matches ‘dynamic’ privacy attackersbetter. For ‘static’ privacy attackers, time intervals withstatic generation depend on privacy attackers’ pre-settingswhich are random. That is why TPNGS performs differentlyin these cases.

As discussed in Section 4.2, time interval dynamic gen-eration focuses on expressing probability fluctuations suf-ficiently as customer privacy in dynamic occurrenceprobabilities. Hence, to some extent, it is a suitable choicefor privacy attackers to break existing noise obfuscationfor obtaining customer privacy. Meanwhile, based on thedynamic feature of time-series data, time interval staticgeneration is a non-optimized tool to obtain probabilityfluctuations. Briefly, these two privacy attackers withdynamic time interval generation are the primary privacyissues under the probability fluctuation privacy riskaddressed in this paper. And our novel TPNGS signifi-cantly improves the effectiveness of privacy protection inthe primary privacy issues. Hence, we can reach the con-clusion that TPNGS improves the effectiveness of privacyprotection on noise obfuscation significantly than HPNGS.

Besides, to evaluate noise obfuscation comprehensively,we have to consider the cost of noise obfuscation. As

introduced before, noise injection intensity is the probabil-ity for injecting QN into QS . It means the probability ofnoise requests in final requests, and we can use it todescribe the cost of noise obfuscation. As introduced inSection 4.2, there are two kinds of time intervals as resultsof CTIG, which causes that there are two different types ofnoise generation processes with different noise injectionintensities: unequal time intervals and equal time inter-vals. Hence, in Figs. 8 and 9, we compare TPNGS andHPNGS in terms of cost by ".

In Fig. 8, in the case of unequal time intervals, we can findout that in thewhole simulation process, noise injection inten-sities ofTPNGS are smaller than those ofHPNGS. Theyfluctu-ate at the levels of about 0.5 and 0.7, respectively. It is obviousthat TPNGS can decrease the cost on noise obfuscation.

In Fig. 9, in the case of equal time intervals, we can seethat in the whole simulation process, noise injection intensi-ties of TPNGS are also smaller than those of HPNGS. Theyfluctuate at the levels of about 0.5 and 0.75, respectively.Again, it is obvious that TPNGS can decrease the cost onnoise obfuscation.

In brief, our novel TPNGS can significantly reduce thecost of noise obfuscation compared to HPNGS. The primaryreason is that TPNGS uses sliding windows (time intervalsand time segments) to analyze time-series data, unlikeHPNGS which only considers the entire time period as awhole sliding window. These windows make TPNGS noisegeneration more precisely based on specific time units. Thatis why TPNGS could get lower cost on noise than HPNGS inthis simulation.

To consider the cost of TPNGS execution, we can discussits worst case, i.e., all requests have different patterns:Because pattern generation is pre-computing, in noise gen-eration processes, TPNGS only needs to traverse all patternsor requests to find the matched one. Hence, this cost wouldnot influence noise generation processes significantly, com-pared to other existing representative strategies.

Besides, as introduced in Section 3, compared to PAs,PPs need time intervals which are inappropriate. InTable 1, we can design a simulation (adjusting results ofCTIG forcibly for TPNGS): with the decreasing of RTIL (theratio of time intervals’ length in PPs and time intervals’length in PAs), EPP increases to 1, and " increases continu-ally, as discussed in Section 3. The column marked greyshows the original results.

In summary, our novel time-series pattern based noisegeneration strategy can significantly improve the effective-ness of privacy protection on noise obfuscation with adecreased noise cost, in comparison to the historical proba-bility based noise generation strategy.

About random generation of noise [10], the effectivenessof privacy protection has been discussed in [9]. So, it isclear that TPNGS can improve the effectiveness of privacy

Fig. 8. Comparison on noise injection intensity in unequal time intervals.

Fig. 9. Comparison on noise injection intensity in equal time intervals.

TABLE 1RTIL’s influences on TPNGS


protection from HPNGS which mainly decreases the cost ofnoise obfuscation from random generation. Therefore, ournovel TPNGS can improve the effectiveness of privacy pro-tection on noise obfuscation from existing representativenoise generation strategies (i.e., HPNGS and random gener-ation), as well as decrease the cost on noise obfuscation.

7 CONCLUSIONS AND FUTURE WORK

In open and virtualized cloud environments, some mali-cious service providers may focus on customer servicedata and collectively deduce customer privacy withoutpermission. Noise obfuscation is an effective approach inthis regard. For example, it generates and injects noise ser-vice requests into real ones to ensure that their occurrenceprobabilities are about the same so that service providerscannot distinguish which requests are real ones. However,existing representative noise generation strategies havenot considered occurrence probability fluctuations. In fact,such occurrence probabilities could fluctuate at some timesegments of the entire time period, which cannot be con-cealed by existing noise obfuscations. Hence, maliciousservice providers are still able to deduce customer privacyfrom these probability fluctuations. To address this proba-bility fluctuation privacy risk, we developed a novel time-series pattern based noise generation strategy for privacyprotection on cloud. In this strategy, first, under our pri-vacy risk model in terms of time interval, we presented anovel CTIG to generate time intervals for time-series pat-tern forecasting. Then, based on these time intervals, weproposed a novel time-series pattern based forecastingalgorithm (TPF) for noise obfuscation to abstract time-series patterns in past probabilities and forecast futureprobabilities. Lastly, based on these forecasted results, wedesigned our novel TPNGS for withstanding the probabil-ity fluctuation privacy risk. The simulation evaluationdemonstrated that our novel strategy could cope withthese fluctuations very well, i.e., significantly improve theeffectiveness of privacy protection. Besides, the cost onnoise requests could be decreased by TPNGS, too.

In the future, based on TPNGS, we plan to investigatehow to protect customer privacy in the scenario where mul-tiple malicious service providers may collaborate with eachother to threat noise obfuscation.

ACKNOWLEDGMENTS

The research work reported in this paper is partly supportedby Australian Research Council under LP110100228,National Natural Science Foundation of China (NSFC) underNo. 61300042, No. 61021004, and Shanghai Knowledge Ser-vice Platform Project ZF1213. Xiao Liu conducted this workat Swinburne University of Technology. We would liketo express our deep gratitude to A/Prof. Jinjun Chen fromUniversity of Technology, Sydney, for his stimulating sug-gestions in this paper. Yun Yang is the corresponding authorof this paper.

REFERENCES

[1] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic,“Cloud Computing and Emerging IT Platforms: Vision, Hype,and Reality for Delivering Computing as the 5th Utility,” FutureGeneration Computer Systems, vol. 25, no. 6, pp. 599-616, 2009.

[2] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Kon-winski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica, and M. Zaha-ria, “Above the Clouds: A Berkeley View of Cloud Computing,”Comm. ACM, vol. 53, no. 6, pp. 50-58, 2010.

[3] W. Jansen and G. Timothy, Guidelines on Security and Privacy inPublic Cloud Computing. Nat’l Inst. Standard and Technology, Spe-cial Publication 800-144, Dec. 2011.

[4] M.D. Ryan, “Cloud Computing Privacy Concerns on our Door-step,” Comm. ACM, vol. 54, no. 1, pp. 36-38, 2011.

[5] S. Sackmann, J. Str€uker, and R. Accorsi, “Personalization in Privacy-Aware Highly Dynamic Systems,” Comm. ACM, vol. 49, no. 9,pp. 32-38, 2006.

[6] C.P. Pfleeger and S.L. Pfleeger, Security in Computing. fourth ed.,Prentice Hall, 2006.

[7] R. Canetti, B. Riva, and G.N. Rothblum, “Practical Delegation ofComputation Using Multiple Servers,” Proc. 18th ACM Conf. Com-puter Comm. Security, pp. 445-454, Oct. 2011.

[8] C. Wang, S.S.M. Chow, Q. Wang, K. Ren, and W. Lou, “Privacy-Preserving Public Auditing for Secure Cloud Storage,” IEEETrans. Computers, vol. 62, no. 2, pp. 362-375, Feb. 2013.

[9] G. Zhang, Y. Yang, and J. Chen, “A Historical Probability BasedNoise Generation Strategy for Privacy Protection in CloudComputing,” J. Computer and System Sciences, vol. 78, no. 5, pp. 1374-1381, 2012.

[10] S. Ye, F. Wu, R. Pandey, and H. Chen, “Noise Injection for SearchPrivacy Protection,” Proc. Int’l Conf. Computational Science and Eng.(CSE ’09), pp. 1-8, Aug. 2009.

[11] M. Cardosa, A. Singh,H. Pucha, andA. Chandra, “Exploiting Spatio-Temporal Tradeoffs for Energy-Aware MapReduce in the Cloud,”IEEE Trans. Computers, vol. 61, no. 12, pp. 1737-1751, Dec. 2012.

[12] G. Zhang, Y. Yang, X. Liu, and J. Chen, “A Time-Series PatternBased Noise Generation Strategy for Privacy Protection in CloudComputing,” Proc. 12th IEEE/ACM Int’l Symp. Cluster, Cloud andGrid Computing (CCGrid ’12), pp. 458-465, May 2012.

[13] D. Moore, C. Shannon, D.J. Brown, G.M. Voelker, and S. Savage,“Inferring Internet Denial-of-Service Activity,” ACM Trans. Com-puter Systems, vol. 24, no. 2, pp. 115-139, 2006.

[14] Y. Zhu, H. Hu, G.-J. Ahn, and M. Yu, “Cooperative Provable DataPossession for Integrity Verification in Multi-Cloud Storage,”IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 12,pp. 2231-2244, Dec. 2012.

[15] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,”ACM SIGMOD Record, vol. 29, no. 2, pp. 439-450, 2000.

[16] A. Evfimievski, J. Gehrke, and R. Srikant, “Limiting PrivacyBreaches in Privacy Preserving Data Mining,” Proc. 22nd ACMSIGMOD-SIGACT-SIGART Symp. Principles of Database Systems(PODS ’03), pp. 211-222, June 2003.

[17] Y. Sang, H. Shen, and H. Tian, “Effective Reconstruction of DataPerturbed by Random Projections,” IEEE Trans. Computers,vol. 61, no. 1, pp. 101-117, Jan. 2012.

[18] B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, “Privacy-PreservingData Publishing: A Survey of Recent Developments,” ACM Com-puting Surveys, vol. 42, no. 4, pp. 1-53, 2010.

[19] V.Rastogi, D. Suciu, and S. Hong, “The Boundary between Pri-vacy and Utility in Data Publishing,” Proc. 33rd Int’l Conf. VeryLarge Data Bases (VLDB ’07), pp. 531-542, Sep. 2007.

[20] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “PrivateInformation Retrieval,” J. ACM, vol. 45, no. 6, pp. 965-981, 1998.

[21] R. Henry, F. Olumofin, and I. Goldberg, “Practical PIR for Elec-tronic Commerce,” Proc. 18th ACM Conf. Computer and Comm.Security (CCS ’11), pp. 677-690, Oct. 2011.

[22] H. Park and K. Shim, “Approximate Algorithms for K-Anonym-ity,” Proc. ACM SIGMOD, pp. 67-78, June 2007.

[23] D. Rogerm, N. Mathewson, and P. Syverson, “Tor: The Second-Generation Onion Router,” Proc. Third USENIX Security Symp.,pp. 303-320, Aug. 2004.

[24] A. Narayanan and V. Shmatikov, “De-Anonymizing SocialNetworks,” Proc. 30th IEEE Symp. Security and Privacy, pp. 173-187, May 2009.

[25] S. Yu, G. Zhao, W. Dou, and S. James, “Predicted Packet Paddingfor Anonymous Web Browsing Against Traffic Analysis Attacks,”IEEE Trans. Information Forensics and Security, vol. 7, no. 4,pp. 1381-1393, Aug. 2012.

[26] C.A. Ardagna, M. Cremonini, S. de Capitani di Vimercati, and P.Samarati, “An Obfuscation-Based Approach for Protecting Loca-tion Privacy,” IEEE Trans. Dependable and Secure Computing, vol. 8,no. 1, pp. 13-27, Jan./Feb. 2011.


[27] H. Xu, S. Guo, and K. Chen, “Building Confidential and EfficientQuery Services in the Cloud with RASP Data Perturbation,” IEEETrans. Knowledge and Data Eng., vol. 26, no. 2, pp. 322-335, Feb.2013.

[28] E. Balsa, C. Troncoso, and C. Diaz, “OB-PWS: Obfuscation-BasedPrivate Web Search,” Proc. IEEE Symp. Security and Privacy,pp. 491-505, May 2012.

[29] X. Liu, Z. Ni, D. Yuan, Y. Jiang, Z. Wu, J. Chen, and Y. Yang, “ANovel Statistical Time-Series Pattern Based Interval ForecastingStrategy for Activity Durations in Workflow Systems,” J. Systemsand Software, vol. 84, no. 3, pp. 354-376, 2011.

[30] C.C. Aggarwal and P.S. Yu, “A Framework for Clustering Uncer-tain Data Streams,” Proc. IEEE 24th Int’l Conf. Data Eng., pp. 150-159, Apr. 2008.

[31] X. Liu, D. Yuan, G. Zhang, W. Li, D. Cao, Q. He, J. Chen, and Y.Yang, “The Design of Cloud Workflow Systems: Architecture,”Functionality and Quality of Service Springer Briefs, Springer, 2012.

Gaofeng Zhang received the BEng and MEngdegrees in computer science from Hefei Univer-sity of Technology, China, in 2005 and 2008respectively. He received the PhD degree fromthe Faculty of Information and CommunicationTechnologies, Swinburne University of Technol-ogy, Melbourne, Australia, 2013, under thesupervision of Prof. Y. Yang from SwinburneUniversity of Technology and A/Prof. Jinjun Chenfrom University of Technology, Sydney. Hisresearch interests include privacy protection

strategy, security mechanism in cloud computing and bigdata.

Xiao Liu received the master’s and bachelor’sdegree from the School of Management, HefeiUniversity of Technology, Hefei, China, in 2007and 2004, respectively, all in information man-agement and information system. He receivedthe PhD degree in computer science and soft-ware engineering from the Faculty of Informationand Communication Technologies, SwinburneUniversity of Technology, Melbourne, Australia,in 2011. He is currently with the SoftwareEngineering Institute, East China Normal Univer-

sity, Shanghai, China. Before joining ECNU, he was a postdoctoralresearch fellow and sessional lecturer in the Centre of Computing andEngineering Software System at Swinburne University of Technology.His research interests include workflow management systems, cloudcomputing, scientific workflow, business process management and qual-ity of service.

Yun Yang received the BS degree from AnhuiUniversity, Hefei, China, in 1984, the MEngdegree from the University of Science and Tech-nology of China, Hefei, China, in 1987, and thePhD degree from the University of Queensland,Brisbane, Australia, in 1992, all in computer sci-ence. He is currently a full professor in the Schoolof Software and Electrical Engineering at Swin-burne University of Technology, Melbourne, Aus-tralia. Prior to joining Swinburne as an associateprofessor, he was a lecturer and senior lecturer

at Deakin University during 1996-1999. Before that, he was a (senior)research scientist at DSTC Cooperative Research Centre for DistributedSystems Technology during 1993-1996. He also worked at the BeijingUniversity of Aeronautics and Astronautics during 1987-1988. He hascoauthored four books and published more than 200 papers in journalsand refereed conferences. His current research interests include cloudcomputing, software technologies, cloud/grid/p2p workflow systems,and service-oriented computing.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Documents

Time-Series Pattern Based Effective Noise Generation for Privacy