PAM_ an Efficient and Privacy Aware Framework for Continiously Moving Objects

8/8/2019 PAM_ an Efficient and Privacy Aware Framework for Continiously Moving Objects

1/17

1

PAM: An Efficient and Privacy-Aware MonitoringFramework for Continuously Moving Objects

Haibo Hu, Jianliang Xu, Senior Member, IEEE, and Dik Lun Lee

AbstractEfficiency and privacy are two fundamental issues in moving object monitoring. This paper proposes a privacy-aware

monitoring (PAM) framework that addresses both issues. The framework distinguishes itself from the existing work by being the first to

holistically address the issues of location updating in terms of monitoring accuracy, efficiency and privacy, in particular when and how

mobile clients should send location updates to the server. Based on the notions of safe region and most probable result, PAM performs

location updates only when they would likely alter the query results. Furthermore, by designing various client update strategies, the

framework is flexible and able to optimize accuracy, privacy or efficiency. We develop efficient query evaluation/reevaluation and safe

region computation algorithms in the framework. The experimental results show that PAM substantially outperforms traditional schemes

in terms of monitoring accuracy, CPU cost and scalability while achieving close-to-optimal communication cost.

Index Termsspatial databases, location-dependent and sensitive, mobile applications

3

1 INTRODUCTION

In mobile and spatio-temporal databases, monitoringcontinuous spatial queries over moving objects is neededin numerous applications such as public transportation,logistics, and location-based services. Fig. 1 shows a typi-cal monitoring system, which consists of a base station, adatabase server, application servers, and a large numberof moving objects (i.e., mobile clients). The databaseserver manages the location information of the objects.The application servers gather monitoring requests andregister spatial queries at the database server, which then

continuously updates the query results until the queriesare deregistered.

MobileClient

BaseStation DatabaseServer ApplicationServers

LocationUpdate ResultUpdate

RegisterQuery

Fig. 1. The System Architecture

The fundamental problem in a monitoring system iswhen and how a mobile client should send location

updates to the server because it determines three prin-cipal performance measures of monitoring accuracy,efficiency and privacy. Accuracy means how often themonitored results are correct, and it heavily dependson the frequency and accuracy of location updates.As for efficiency, two dominant costs are: the wirelesscommunication cost for location updates and the query

Haibo Hu and Jianliang Xu are with the Hong Kong Baptist University,Hong Kong SAR, China. Dik Lun Lee is with the Hong Kong University ofScience and Technology, Hong Kong SAR, China.This work was supported by the Research Grants Council, Hong KongSAR, China under Project No. HKBU211206, HKBU211307, HKBU210808,

HKBU1/05C, RGC GRF 615806 and CA05/06.EG03.

evaluation cost at the database server, both of whichdepend on the frequency of location updates. As forprivacy, the accuracy of location updates determineshow much the clients privacy is exposed to the server.

In the literature, very few studies on continuous querymonitoring are focused on location updates. Two com-monly used updating approaches are periodic update(every client reports its new location at a fixed inter-val) and deviation update (a client performs an updatewhen its location or velocity changes significantly) [24],[32], [35], [47]. However, These approaches have several

deficiencies. First, the monitoring accuracy is low: queryresults are correct only at the time instances of periodicupdates, but not in between them or at any time ofdeviation updates. Second, location updates are per-formed regardless of the existence of queries a highupdate frequency may improve the monitoring accuracy,

but is at the cost of unnecessary updates and queryreevaluation. Third, the server workload using periodicupdate is not balanced over time: it reaches the peakwhen updates arrive (they must arrive simultaneouslyfor correct results) and trigger query reevaluation, butis idle for the rest of the time. Last, the privacy issueis simply ignored by assuming the clients are always

willing to provide their exact positions to the server.Some recent work attempted to remedy the privacy

issue. Location cloaking was proposed to blur the exactclient positions into bounding boxes [18], [14], [31],[26]. By assuming a centralized and trustworthy third-party server that stores all exact client positions, variouslocation cloaking algorithms were proposed to build the

bounding boxes while achieving the privacy measuresuch as k-anonymity. However, the use of bounding

boxes makes the query results no longer unique. Assuch, query evaluation in such uncertain space is morecomplicated. A common approach is to assume that the


2/17

2

Q(NN)

a

bboundingbox

saferegion

exactposition

Fig. 2. Monitoring Example

probability distribution of the exact client location in

the bounding box is known and well-formed. Therefore,the results are defined as the set of all possible resultstogether with their probabilities [14], [31], [7]. However,all these approaches focused on one-time cloaking orquery evaluation; they cannot be applied to monitor-ing applications where continuous location update isrequired and efficiency is a critical concern.

In [21], we proposed a monitoring framework wherethe clients are aware of the spatial queries being mon-itored, so they send location updates only when theresults for some queries might change. Our basic idea isto maintain a rectangular area, called safe region, for eachobject. The safe region is computed based on the queriesin such a way that the current results of all queriesremain valid as long as all objects reside inside their re-spective safe regions. A client updates its location on theserver only when the client moves out of its safe region.This significantly improves the monitoring efficiency andaccuracy compared to the periodic or deviation updatemethods. However, this framework fails to address theprivacy issue, that is, it only addresses when but nothow the location updates are sent.

In this paper, we take a more comprehensive ap-proach instead of dealing with when and howseparately like most existing work, we propose a PAM

(privacy-aware monitoring) framework that incorporatesthe accuracy, efficiency and privacy issues altogether.We adapt for the monitoring environment the privacymodel that has been employed by location cloakingand other privacy-aware approaches. More specifically,a client encapsulates its exact position in a bounding

box, and the timing and mechanism with which the boxis updated to the server are decided by a client-sidelocation updater as part of PAM.

However, the integration of privacy into the monitor-ing framework poses challenges to the design of PAM.First, with the introduction of bounding boxes, the resultof a query is no longer unique. Among all possible

results, we argue that the most probable result, i.e., theone with the highest probability, is most promising forapproximating the genuine result (the result derived

based on the exact positions). The probability is com-puted by assuming a uniform distribution of the exactclient position in the bounding box. Fig. 2 shows twoclients a, b together with their bounding boxes. Both thegenuine and most probable result for the 1NN query Qare {a}. However, even monitoring only the most prob-able result adds great complexity to query evaluation.As such, one of the main contributions of this paperis to devise efficient query processing algorithms for

common spatial query types. Second, the most probableresult also adds complexity to the definition of saferegion. New algorithms must be designed to computemaximum safe regions in order to reduce the number oflocation updates and thus improve efficiency. Third, asthe location updater decides when and how a bounding

box is updated, its strategy determines the accuracy,privacy and efficiency of the framework. The standardstrategy is to update when the centroid of the bounding boxmoves out of the safe region, which guarantees accuracy no miss of any change of the most probable result.To optimize privacy or efficiency, however, alternativestrategies must be devised. Compared to the previouswork, the PAM framework has the following advantages:

To our knowledge, this is the first comprehensiveframework that addresses the issue of location up-dating holistically with monitoring accuracy, effi-ciency and privacy altogether. This framework ex-tends from our previous work [21] by introducing acommon privacy model, and therefore suits realistic

scenarios. As for efficiency, the framework significantly re-duces location updates to only when an object ismoving out of the safe region and thus is very likelyto alter the query results.

As for accuracy, the framework offers correct moni-toring results at any time, as opposed to only at thetime instances of updates in systems that are basedon periodic or deviation location update.

The framework is generic in the sense that it isnot designed for a specific query type. Rather, itprovides a common interface for monitoring varioustypes of spatial queries such as range queries and

kNN queries. Moreover, the framework does notpresume any mobility pattern on moving objects.

The framework is flexible in that by designing ap-propriate location update strategies, accuracy, pri-vacy or efficiency can be optimized.

In the rest of this paper, we will explore the PAMframework, especially on the aspects of query evalu-ation and safe region computation. The remainder ofthis paper is organized as follows. Section 2 reviewsthe related work. Section 3 overviews the frameworkcomponents, followed by Sections 4 and 5 where queryevaluation and safe region computation are presented,with an emphasis on range and kNN queries. Dynamic

client update strategies are given in Section 6 to optimizeprivacy and efficiency. Experimental results of PAM areshown in Section 7.

2 RELATED WOR K

There is a large body of research work on spatial-temporal query processing. Early work assumed a staticdataset and focused on efficient access methods (e.g.,R-tree [19]) and query evaluation algorithms (e.g., [20],[37]). Recently, a lot of attention has been paid tomoving-object databases, where data objects or queriesor both of them move.


3/17

3

Assuming object movement trajectories are knowna priori, Saltenis et al. [38] proposed the Time-Parameterized R-tree (TPR-tree) for indexing movingobjects, where the location of a moving object is rep-resented by a linear function of time. Benetis et al.[3] developed query evaluation algorithms for NN andreverse NN search based on the TPR-tree. Tao et al. [41]optimized the performance of the TPR-tree and extendedit to the TPR-tree. Chon et al. [10] studied range andkNN queries based on a grid model. Patel et al. [34]proposed a novel index structure called STRIPES usinga dual transformation technique.

The work on monitoring continuous spatial queriescan be classified into two categories. The first cate-gory assumes that the movement trajectories are known.Continuous kNN monitoring has been investigated formoving queries over stationary objects [40] and linearlymoving objects [22], [36]. Iwerks et al. extended [22]to monitor distance semijoins for two linearly mov-ing datasets [23]. However, as pointed out in [39], the

known-trajectory assumption does not hold for manyapplication scenarios (e.g., the velocity of a car changesfrequently on road).

The second category does not make any assumptionon object movement patterns. Xu et al. [44] and Zhang etal. [48] suggested returning to the client both the queryresult and its validity scope where the result remainsthe same. As such, the query is reevaluated only whenthe query exits the validity scope. However, their so-lutions work for stationary objects only. For continuousmonitoring of moving objects, the prevailing approach isperiodic reevaluation of queries [24], [32], [35], [47]. Prab-hakar et al. [35] proposed the Q-index, which indexes

queries using an R-tree-like structure. At each evaluationstep, only those objects that have moved since the previ-ous evaluation step are evaluated on the Q-index. Whilethis study is limited to range queries, Mokbel et al. [32]proposed a scalable incremental hash-based algorithm(SINA) for range and kNN queries. SINA indexes bothqueries and objects, and it achieves scalability by em-ploying shared execution and incremental evaluation ofcontinuous queries [32], [43]. Kalashnikov et al. and Yu etal. suggested grid-based in-memory structures for objectand query indexes to speed up reevaluation process ofrange queries [25] and kNN queries [47]. Access methodsto support frequent location updates of moving objects

have also been investigated [24], [29]. Our study fallsinto this category but distinguishes itself from existingstudies with a comprehensive framework focusing onlocation update.

Uncertainty and privacy issues have been recentlystudied in moving object monitoring. To protect locationprivacy, various cloaking or anonymizing techniqueshave been proposed to hide the clients actual location.Among them are the spatio-temporal cloaking [18], theClique-Cloak [14], [15], the Casper anonymizer [31],hilbASR [17], [26], and peer-to-peer cloaking [11], [16].In spatio-temporal cloaking, for each location update,

the server divides the space recursively in a quad-tree-like format till a suitable subspace is found to cloakthe updated location. The CliqueCloak algorithm con-structs a clique graph to combine some clients whocan share the same cloaked spatial area. The Casperanonymizer is associated with a query processor to en-sure the anonymized area returns the same query resultas the actual location. In hilbASR, all user locationsare sorted by Hilbert space-filling curve ordering, andthen every k users are grouped together in this order.Besides location cloaking, pseudonym, dummy, and trans-

formation were also proposed for privacy preservation.Pseudonym decouples the mapping between the useridentity and the location so that an untrusted server onlyreceives the location without the user identity [33], [4].Dummy generates fake user locations (called dummies)and mixes them together with the genuine user locationinto the request [28], [46], [45]. Transformation utilizescertain one-way spatial transformations (e.g., a spacefilling curve) to map the query space to another space

and resolves query blindly in the transformed space [27].As for location uncertainty, a common model for char-acterizing the uncertainty of an object is a closed regionwith a predefined probability distribution of this objectin the region. Based on this probabilistic model, queryprocessing and indexing algorithms have been proposedto evaluate probabilistic range queries [12], [31] andkNN queries [9]. While in these studies the objects areuncertain, the queries themselves are still certain. Chenand Cheng extended the probabilistic processing to moregeneral cases where the queries are also uncertain [7].Our study, on the other hand, addresses the continuousmonitoring issue. By adopting the notion of safe re-

gion, the frequency of query reevaluation on uncertainlocation information is reduced, and hence the systemefficiency and scalability are improved.

Distributed approaches have been investigated tomonitor continuous range queries [6], [13] and contin-uous kNN queries [42]. The main idea is to shift someload from the server to the mobile clients. Monitoringqueries have also been studied for distributed Internetdatabases [8], data streams [1], and sensor databases [30].However, these studies are not applicable to monitoringof moving objects, where a two-dimensional space isassumed.

3 FUNDAMENTALS OF PAM FRAMEWORK

3.1 Privacy-Aware Location Model

In this paper, we assume that the clients are privacyconscious. That is, the clients do not want to expose their

genuine point locations to the database server to avoidspatiotemporal correlation inference attack [14], by whichan adversary may infer users private information suchas political affiliations, alternative lifestyles, or medicalproblems. For example, knowing that a user is inside aheart specialty clinic during business hours, the adver-sary can infer the user might have a heart problem. This


4/17

4

has been cited as a major privacy threat in location-basedservices and mobile computing. To protect against it,most existing work suggest replacing accurate point lo-cations by bounding boxes to reduce location resolutions[18], [14], [31], [26], [7], [17]. With a large enough location

box covering the sensitive place (e.g., the clinic) as wellas a good number of other insensitive places, the successrate or confidence of such spatiotemporal correlationinference can be reduced significantly. In our monitoringframework, we take the same privacy-aware approach.Specifically, each time a client detects his/her genuinepoint location, it is encapsulated into a bounding box.Then the client-side location updater decides whether ornot to update that box to the server.1 Without any otherknowledge about the client locations or moving patterns,upon receiving such a box, the server can only presumethat the genuine point location is distributed uniformlyin this box. To simplify the presentation in this paper,we further restrict the shape of such a bounding boxto a -by- square (or in short -square), where is

customizable for each object. Our problem is thereforeto monitor result changes of spatial queries as objectsmove, and monitor them as accurately as possible andat the lowest cost of location updates.

The key idea to solving the problem is safe region,which was defined in [21] as a rectangle within whichthe change of object location does not change the resultof any registered spatial query. Now that locations are-squares instead of points, to clarify the definition ofwithin, we use the centroid point of the square as arepresentative, so the safe region is essentially a saferegion for the centroid of the -square. However, theconsequence of introducing -square is more than that

the result of a spatial query is no longer unique. Forexample, if the -square of an object partially overlapswith a range query, this object could be either a resultobject or a non-result object of this query. As such, aunique definition of query result under -squares is aprerequisite of safe region.

Since the genuine point location of an object is dis-tributed uniformly in its -square, we can define the(unique) query result as the one with the highest prob-ability among all possible results. As in the previousrange query example, if the majority of the -square fallsinside the range query, that object is most probably aresult object of this query; otherwise, that object is most

probably a non-result object. With the notion of mostprobable result, we thereby define the safe region asa rectangle within which the change of the centroid ofthe objects -square does not change the most probableresult of any registered spatial query. The standard updatestrategy of the client is therefore to update when thecentroid of the -square is out of the safe region.

1. The computation of a proper bounding box to satisfy a certainprivacy metric (such as k-anonymity) has been extensively studied inthe literature [14], [26], [17] and is beyond the scope of this paper.Nonetheless, the larger the box is, the less successful and confidentthe adversarys inference becomes.

ObjectIndex QueryIndex

QueryProcessor LocationManagerRegisterQuery

2

31 4

5

LocationUpdater

Sa

fe

Re

gion

Loca

tion

Up

da

te

BoundingBox

ResultUpdatedatabaseserver

mobileclient

Fig. 3. PAM Framework Overview

The reason why we exclude all other less probableresults in this definition is threefold: 1) monitoring con-tinuous queries usually trades accuracy for efficiency although the most probable result does not alwaysalign with the genuine result (the result derived based ongenuine point locations of all objects), we will show inSection 4 that it is efficient to compute and thereforeprevents the server from being computationally over-

loaded; 2) if the query result were defined as the setof all possible results, the safe region would have to beextremely small to report location updates if any of thepossible results changes, which makes the update costoverwhelmingly high; 3) we do not want the choice of-square which is made by the client to affect queryresults heavily, and obviously the most probable resultsare less vulnerable than other result definitions.

3.2 Framework Overview

As shown in Fig. 3, the PAM framework consists of

components located at both the database server andthe moving objects. At the database server side, wehave the moving object index, the query index, thequery processor, and the location manager. At movingobjects side, we have location updaters. Without lossof generality, we make the following assumptions forsimplicity:

The number of objects is some orders of magnitudelarger than that of queries. As such, the query indexcan accommodate all registered queries in mainmemory while the object index can only accom-modate all moving objects in secondary memory.This assumption has been widely adopted in many

existing proposals [25], [47], [21]. The database server handles location updates se-quentially; in other words, updates are queued andhandled on a first-come-first-serve basis. This is areasonable assumption to relieve us from the issuesof read/write consistency.

The moving objects maintain good connection withthe database server. Furthermore, the communica-tion cost for any location update is a constant.With the latter assumption, minimizing the cost oflocation updates is equivalent to minimizing thetotal number of updates.


5/17

5

PAM framework works as follows (see Fig. 3). At anytime, application servers can register spatial queries tothe database server (step ). When an object sends alocation update (step ), the query processor identifiesthose queries that are affected by this update usingthe query index and then reevaluates them using theobject index (step ). The updated query results arethen reported to the application servers who registerthese queries. Afterwards, the location manager com-putes the new safe region for the updating object (step), also based on the indexes, and then sends it backas a response to the object (step ). The procedure forprocessing a new query is similar, except that in , thenew query is evaluated from scratch instead of beingreevaluated incrementally, and that the objects whosesafe regions are changed due to this new query must

be notified. Algorithm 1 summarizes the procedure atthe database server to handle a query registration/de-registration or a location update.

Algorithm 1 Overview of Database Behavior1: while receiving a request do2: if the request is to register query q then3: evaluate q;4: compute its quarantine area and insert it into the query index;5: return the results to the application server;6: update the changed safe regions of objects;7: else if the request is to deregister query q then8: remove q from the query index;9: else if the request is a location update from object p then

10: determine the set of affected queries;11: for each affected query q do12: reevaluate q;13: update the results to the application server;14: recompute its quarantine area and update the query index;15: update the safe region ofp;

It is noteworthy that, although in this paper the mostprobable result is used, this framework can also adaptto other query result definitions such as over a prob-ability confidence (e.g., returns objects that have 90%probability inside the query range). The only changesneeded to reflect the new result definition are the queryevaluation algorithms in the query processor and saferegion computation in the location manager. In the restof this paper, we stick to the definition of the mostprobable result and leave the modification details forother definitions to interested readers.

The following subsections explain the components at

the database server in detail, , and Section 6 describesthe update strategy of the client-side location updater.

3.3 The Object Index

The object index is the server-side view on all objects.More specifically, to evaluate queries, the server muststore the spatial range, in the form of a bounding box,within which each object can possibly locate. Note thatthis bounding box is different from a -square because itsshape also depends on the client-side location updater.That is, it must be a function (denoted by ) of the last

updated -square and the safe region. As such, this boxis called a bbox as a mark of distinction. In particular,for the standard update strategy, the bbox is the saferegion enlarged by /2 on each side, or formally, theMinkowski sum 2 of the safe region and a /2-square.

With the same rationale for which we assume thegenuine point location of an updating object to distributeuniformly in the -square, we assume that the genuinepoint locations are distributed uniformly in their respec-tive bboxes when queries are evaluated or reevaluated.The object index is built on the bboxes to speed upthe evaluation. While many spatial index structures canserve this purpose, this paper employs the R*-tree index[2], [19], which is most widely adopted in the literature.Since the bbox changes each time the object updates, theindex is optimized to handle frequent updates [29].

3.4 The Query Index

For each registered query, the database server stores: (1)

the query parameters (e.g., the rectangle of a range query,the query point and the k value of a kNN query), (2) thecurrent query results and, (3) the quarantine area of thequery. The quarantine area is used to identify the querieswhose results might be affected by an incoming locationupdate. It originates from the quarantine line, which isa line that splits the entire space into two regions: theinner region and the outer region. An object becomesa result object if it enters the inner region; likewise,it becomes a non-result object once it enters the outerregion. However, the ideal quarantine line is difficult tocompute, especially in the context of the most probableresult. In addition, as object locations have extensions

rather than points, the quarantine line is not unique fora query. As such, we allow fuzziness by relaxing the lineto an area called quarantine area. That is, the entirespace is split into three regions: the inner region, thequarantine area, and the outer region. The former twoare separated by the inner bound of the quarantine area,whereas the latter two are separated by theouter boundof the quarantine area. To ease the computation of thesetwo bounds, an object becomes a result object if its -square moves totally inside the inner bound; on the otherhand, an object becomes a non-result object once its -square crosses or is outside the outer bound. Therefore, aquery Q is not affected only if of the updated -square

p and its last updated -square plst, both of them aretotally inside the inner bound or both of them cross orare outside the outer bound of the quarantine area.3

For a range query q, the query window can serve as aninner bound of the quarantine area, because any objectwhose -square is fully inside q is a trivial result ofq. Onthe other hand, an outer bound can be the Minkowski

2. The Minkowski sum of two shapes A and B in Euclidean spaceis the result of adding every point in B to every point in A, i.e., theset {a+ b|a A, b B}.

3. For kNN queries, if the order of the result objects is sensitive, Q isnot affected only if both of them cross or are outside the outer region.


6/17

6

outerbound

qandinnerbound

/2

/2

(a) Range Query

q

ok-1 ok

innerbound outerbound

(b) kNN Query

Fig. 4. Quarantine Area

sum of q and a /2-square, i.e., enlarging q by /2 oneach side. The correctness of this bound can be verified

by the observation that for any -square that crossesthis bound, the majority of this square must be outsideq, thus making the object a non-result object. In casethere are different s for different objects, the largest is used. Fig. 4(a) shows the inner and outer bounds ofqs quarantine area.

For a kNN query, since only the distance to the querypoint q matters, we set both the inner and the outer

bounds as circles centered at q. Furthermore, since the k-

th NN ok determines whether other object is or is not aresult object, we set the radii of the two circles basedon ok. More specifically, the inner bound circle is setto be the minimum distance between q and the bbox ofok, so that if a -square is totally inside this circle, it isguaranteed to be closer to q than ok. On the other hand,the outer bound circle is set to be the maximum distance

between q and the bbox ofok, plus . If d(s, t) denotes thedistance between two points s and t, d(S, T) (D(S, T))denotes the minimum (maximum) distance between apair of points in areas S and T, then the radii of the innerand outer circle are d(q, ok) and D(q, ok)+, respectively.

To quickly find all affected queries, an in-memory

grid-based index is built on the quarantine areas of allqueries. The index partitions the entire space into MMuniform grid cells, and the bucket for each cell pointsto those queries whose quarantine areas overlap with orfully enclose this cell. If we define the cell(s) that overlapwith the -square of the updating object p as the homecell(s) of p, then only queries pointed at by the homecell(s) ofp or plst are affected and thus need reevaluation.

3.5 Query Processor and Location Manager

In the PAM framework, based on the object index, thequery processor evaluates the most probable result when

a new query is registered, or reevaluates the most prob-able result when a query is affected by location updates.Obviously, the reevaluation is more efficient as it can

be based on previous results. The detailed algorithms ofquery evaluation and reevaluation will be presented inSection 4.

The location manager computes the safe region of anobject p (denoted as p.sr). Recall that a safe region isa rectangle within which the change of the centroid of

ps -square does not change the most probable result ofany registered query. As queries are independent of eachother, we can further define the safe region for a single

pR

r

x

yk( )

Fig. 5. Random Movement

query Q (denoted as p.srQ) as a rectangle in which the

change of the centroid of ps -square does not changethe most probable result of Q. By this definition, p.srQis a rectangular approximation, or more accurately aninscribed rectangle, of Qs inner (if p is a result object)or outer (if p is a non-result object) regions, which areseparated by the quarantine line. The reason why thesafe region is based on the quarantine line rather thanthe quarantine area is that the latter is much coarser.Furthermore, the quarantine area is used only to filterout the queries that are not affected by a location update,so we trade accuracy for efficiency. The safe region, onthe other hand, directly dictates the frequency and hencethe cost of location updates, so we compute it based onthe more accurate quarantine line.

After each individual p.srQ is computed, p.sr is simplythe intersection of these p.srQ from all registered queries.To eliminate those queries whose safe regions do notcontribute to p.sr, the location manager further requiresevery p.srQ (and thus the p.sr) to be fully containedin the home cell(s). Recall that the home cell(s) are thegrid cell(s) of the query index where the -square of

p is contained or overlaps. By this means, the locationmanager only needs to compute the safe regions forthose queries (subsequently called relevant queries) whosequarantine areas are contained or overlap with the home

cell(s). These relevant queries are exactly those indexedby the home cell(s) of the query index.

The location manager recomputes the safe region of anobject p in two cases: 1) after a new query Q is evaluated;and 2) after p sends a location update. In the former case,since no existing queries change their quarantine lines,the new safe region p.sr is simply the intersection of thecurrent safe region p.sr and p.srQ, the safe region for thisnew query Q. Ifp.sr is different from p.sr, the new saferegion should be updated to p. In the latter case, thequarantine areas of some existing queries might change,therefore p.sr needs to be completely recomputed bycomputing the p.sr

Qfor each relevant query and then

getting the intersection.

As the objective of the PAM framework is to minimizethe number of location updates, the following theoremshows that the safe region should be the inscribed rect-angle of the inner or outer region with the maximumperimeter.

Theorem 3.1: Assume that the object p moves in arandomly chosen direction with a constant speed (seeFig. 5), and that -square is small enough to be ignored.Given a convex safe region R and the updated location

p, the amortized location update cost for p over time,


7/17

7

Costp, is

Costp = Cl 2

0

k()d

2

1 .=

Cl 2

Perimeter(R),

where Cl is the cost for one location update, is theangle between the moving direction and the positive x-axis, k() is the length of segment pr, r is the intersection

point of this direction and the boundary of R, in otherwords, r is the location at which the next location updateoccurs.

Proof: First of all, r must be unique for every .Otherwise, if there were another r, the points in segmentrr do not belong to R, which contradicts the convexassumption. As such, given , the elapsed time before

the next location update is k() . The average elapsed time

over all isR20

k()

dR20 d

=20

k()d2

. Therefore, we have

Costp = Cl 2

0

k()d

2

1 .=

Cl 2

Perimeter(R),

because20 k()d .= Perimeter(R).

Therefore, the optimal safe region p.srQ is theinscribed reectangle with the longest perimeter, orshortly Ir lp, of Qs inner or outer region. Section 5 willpresent the detailed algorithms to compute the optimal

p.srQ for each type of query.

4 QUERY PROCESSING

In this section, we present the detailed algorithms toevaluate or reevaluate a spatial query Q in terms ofthe most probable result. Aside from the definition ofthe query result, we know that Q also differs from a

conventional spatial query in that the object locationsare in the form of -square (for updating objects) orbbox (for other objects), both of which are rectangular.In this section, instead of regarding Q as a special querytype, we take an alternative approach by regardingthe space where the object locations are defined as aspecial Euclidean space. In this space, spatial relationssuch as overlapping, containment, or even distance areimplemented differently from a conventional Euclideanspace. By using the new implementations of spatialrelations, existing spatial query processing algorithmscan be applied directly to the new space.

In the following subsections, we implement two re-lations that are required for spatial queries, namely,containment, and closer.

4.1 Spatial Relations

In this new space, an object p is contained in a rectangleR if in the Euclidean space, the majority ofp is in R. Therectangle divides the enlarged safe region of any object

p into two regions: the region inside rectangle q (wherep is a result object of q) and the region outside q (wherep is not a result). The region with the larger area decidesthe most probable result.

In this new space, an object p1 is closer to a point q thanobject p2 if and only if in the Euclidean space, for tworandomly picked points a, b from p1 and p2 respectively,a is equally or more probably closer to q than b. Thecloser relation has a nice property that it is a total orderrelation, which is proved by the following preposition.

Proposition 4.1: The closer relation is a total order re-lation, that is, it satisfies 1) reflexivity, 2) antisymmetry,3) transitivity, and 4) comparability.

PROOF SKETCH. 1) and 2) are trivial.3). Transitivity: if p1 is closer than p2 and p2 is closerthan p3, then a (from p1) is more probably closer to qthan b (from p2), which is in turn more probably closerto q than c (from p3). As such, p1 is closer than p2.4). Comparability: p1, p2, either p1 is closer than p2, or

p2 is closer than p1.

Therefore, the most probable result of kNN query q isdefined as the top-k objects of all objects in the closerorder of their enlarged safe regions.

To implement the closer relation, we present anefficient algorithm that is based on finding out whichobject has more portion of area closer to point q. Insteadof computing the exact shape of such area, which isforbiddingly costly, the algorithm is based on the divide-and-conquer paradigm. It maintains a priority queue Qwhose elements are pairs of subrectangles of p1 and

p2 that have not yet been compared. Initially, the pair< p1, p2 > is inserted into Q and the portion of areawhere p1 (or p2) is closer is 0. Each time an element< p1, p

2 > pops up from Q (where p

1 is a subrectangle

ofp1 and p2 is a subrectangle ofp2), the algorithm checksif any point in p1 (or p

2) is always closer than any point

in p2 (or p1). If this is the case (case 1), the multiple ofthe area p1 (or p

2) is added to the portion of area where

p1 (or p2) is closer. If this is not the case (case 2), p1 orp2, whichever is larger, is split into 4 equal subrectanglesand thus 4 new pairs are inserted to Q. The reason tosplit the larger rectangle is that, the resulted pairs aremore probable to become pairs of case 1. The algorithmcontinues until either the portion of area where p1 (or

p2) is closer exceeds 0.5, or the queue Q becomes empty.It is noteworthy that the portion of area where p1 (or

p2) is closer is essentially the probability that p1 (orp2) is closer. As such, this algorithm always returnsthe correct result. On the other hand, the algorithm is

efficient because it terminates as soon as one portionof area exceeds 0.5. In order to let the portion of areaconverge to the actual probability more quickly, we usethe multiple of portions of area as the key to sort thepairs in Q.

4.2 Query Evaluation and Reevaluation on Object

Index

In conventional Euclidean space, a new range query isevaluated as follows. We start from the index root andrecursively traverse down the index entries that overlap


8/17


9/17

9

p(x,y)

a

b

1q

1 2

(a) Quarantine Line

p'

a

b

p*(x,y)

q

(b) The Optimal Safe RegionFig. 6. Safe Region for Range Query

more than half of the -square must reside in the querywindow. To obtain the quarantine line, we only need toconsider the special case when exactly half of the squareresides in the query window, which can be furtherdivided into two subcases. In the first subcase, p is onthe window border as box 1 shows, we have eithery = b and x + /2 b or x = a and y + /2 a.In the second subcase, p is not on the border as box2 shows, we have (/2 + a x)(/2 + b y) 2/2.The two subcases give us the quarantine line in the firstquadrant (the bold curve in Fig. 6(a)), which is defined

by the following formulae:

x = a if y b /2 , ory = b if x a /2 , or(/2 + a x)(/2 + b y) = 2/2. otherwise

(1)And the inner region in the first quadrant is therefore

the shaded shape. Summing up all the 4 quadrants, thetotal inner region of this query is the bold shape inFig. 6(b).

The second step is to find the Ir lp of the innerregion. For any inscribed rectangle whose corner point

in the first quadrant is (s, t), the perimeter is 2s + 2t. Onthe other hand, since (s, t) must be also on the quarantineline, x = s, y = t must be a solution to Equation 1.This equation shows the perimeter 2s + 2t is maximizedat p when

2 + a x =2 + b y =

2

. Thus, the

optimal safe region is the solid rectangle whose cornerpoint is p (see Fig. 6(b)). However, this safe region maynot contain the centroid of the updating object p. Forexample, in Fig. 6(b), if the centroid is at p, then allinscribed rectangles that contain p lie between the twodotted rectangles whose horizontal sides and verticalsides pass p. In this case, the optimal safe region isone of the two dotted rectangles with longer perimeter.

Therefore, we reach the following proposition on the saferegion for a result object:

Proposition 5.1: For a result object p of a range query(2a-by-2b), the corner of the safe region is:

(x, y) =

(a 212

, b 212

) if p.x a 212

&p.y b 212

(p.x, 2

2(a+/2p.x) 2b+2 ),

or( 2

2(b+/2p.y) 2a+2 ,p.y) otherwise.

Ifp is a non-result object, the safe region is an inscribedrectangle of the outer region. Such rectangle has the

diagonalsquare

sidesquare

q

p

p

(a) Diagonal and Side Squares

q

I II

p

(b) q and Inside Part

Fig. 7. Lemmas on Squares

o.bboxdiagonalsquare

III

q

sidesquare

IIIIV

touch

lowerboundingcircle upperboundingcircle

Fig. 8. Upper and Lower Bounding Circles

q

4

4

Fig. 9. Bounding Circle

longest perimeter when its corner point p is at (a, 0) or(0, b). Similar to the case when p is a result object, this

rectangle can serve as the safe region only if it containsthe centroid of updating object p; otherwise, the saferegion is chosen from the two dotted rectangles that hasa longer perimeter.

5.2 Safe Region for kNN Query

We first consider the case when object p is the i-th NN(denoted by oi) of the query. By definition, its -squaremust be closer than the bbox of oi+1, but farther than thebbox of oi1. However, the exact quarantine line (andhence the inner or outer region) for p based on this lineis complex. In what follows, we approximate the inner

region with a ring centered at the query point q.As the first step, we show that a circle centered at q

splits a -square into inside and outside parts, and theirareas are dependent on the angle of the -square to q.

Lemma 5.2: Among all squares of the same size and ofthe same distance to q, the diagonal square, whose diag-onal coincides with the line of pq, has the smallest insidepart while the side square, whose sides are parallel to pq,has the largest inside part. (ref. Fig. 7(a)).

On the other hand, the area of the inside part alsodepends on the length of pq. For example in Fig. 7(b),the two -squares are of the same angle, but the squarethat is closer to q has a larger inside part (area I) than

the farther square (area II).Lemma 5.3: For squares of the same angle to q, the

closer p to q, the larger the inside part.Applying these two lemmas, we can define the lower

and upper bounding circles for an object o. In Fig. 8, thereare two circles, plotted by solid arcs, that touch the nearand the far endpoints of the bbox of o. Then there must

be a diagonal square and a side square that are splitinto equal-areaed inside and outside parts by these twoarcs, respectively. The lower and upper bounding circles,plotted by dotted arcs, are the circles that cross thecenters of these two squares. By this definition, as long as


10/17

10

the centroid ofps -square is within the lower boundingcircle, p is always closer than o; on the other hand, aslong as the centroid is beyond the upper bounding circle,

p is always farther than o. The following propositionproves the correctness.

Proposition 5.4: Any -square whose centroid is within(beyond) the lower (upper) bounding circle for object omust be closer (farther) to q than o.

Proof: Since any point in the inside part of thediagonal square (i.e., area II) is always closer than anypoint in the bbox of o, and since the inside part is halfof the square, by definition the diagonal square is closerthan the bbox of o. On the other hand, by Lemmas 5.2and 5.3, any square whose center is closer than that ofthe diagonal square must be closer than the diagonalsquare. Applying the transitivity of the closer relation,any square whose centroid is within the lower boundingcircle is closer to q than o. The proof for the upper boundis similar.

Based on Proposition 5.4, the inner region for p (i.e.,

oi) can be approximated by a ring that is formed by thelower bounding circle for oi+1 and the upper boundingcircle for oi1. To find the radii of the upper and lower

bounding circles, we further adopt an approximationalgorithm as follows. As shown in Fig. 9, to computethe lower bounding circle, the diagonal square is firstpartitioned into M M (e.g., 4 4) sub-squares. Thenthe distance between q and the farthest endpoint (thesmall hollow or solid circles in the figure) of each

sub-square is computed. The medium (i.e., the M2

2 -thshortest) distance is set to the radius for the lower

bounding circle. This bounding circle is guaranteed tosatisfy Proposition 5.4 because the sub-squares of the

first M2

2 shortest distances (their farthest endpoints areshown as hollow circles) must be inside the boundingcircle, and these sub-squares already account for halfof the total area. Therefore, this circle can serve as anapproximation of the lower bounding circle. The sameapproximation can be applied to upper bounding circles.Obviously, better approximation can be achieved usinglarger M, which is at the cost of higher computationoverhead.

Once the ring is obtained, the safe region is the in-scribed rectangle of the ring that has the longest perime-ter (Ir lp). In [21], we showed the following proposition(see Fig. 10(a)):

Proposition 5.5: The Ir-lp of a ring that is centered atq with inner radius r and outer radius R is the one ofthe following two Ir-lp which has a longer perimeter.The perimeter of the first (horizonal) Ir-lp is 4Rsin1 +2(Rcos1 r) where 1 is:

1 =

arctg2 if x arctg2 y , orx if x < arctg2 , ory if arctg2 < y,

where x = arcsinp.xq.x

R and y = arccosq.yp.y

R . Andthe perimeter of the second (vertical) Ir-lp is 4Rcos2 +

q

p

r

R

x

xI

II

x

y

x'

x'

(a) Ring

q

p

y

x x

a

b

t

(b) Complement of a Circle

Fig. 10. Computing Ir-lp

2(Rsin2 r) where 2 is:

2 =

arcctg2 if x arcctg2 y , orx if x < arcctg2 , ory if arcctg2 < y.

Finally, we reach the follow proposition on the saferegion for a result object oi.

Proposition 5.6: The safe region of the i-th NN oi is theIr lp of the ring that consists of the upper boundingcircle for oi1 and the lower bounding circle for oi+1.

It is noteworthy that, for the first NN (i.e., i = 1),the ring degenerates to a circle. On the other hand, ifobject p is a non-result object, we can approximate theouter region by the complement of the upper boundingcircle of ok. As such, the safe region is the Ir lp ofthe complement of a circle. In [21], we showed that (seeFig. 10(b)):

Proposition 5.7: The Ir-lp of the complement of a circlecentered at q with radius r is the inscribed rectangle withone corner being the cell corner corresponding to p andthe opposite corner is x. x is either on the 1/4 circle

whose is:

=

/4 if y /4 x , orx if x < /4 , ory if y > /4,

where x = arcsinp.xq.x

r and y = arccosp.yq.y

r .

6 DYNAMIC CLIENT UPDATE STRATEGY

The standard update strategy, which updates when thecentroid of -square is out of the safe region, guarantees100% monitoring accuracy in the context of the mostprobable result. This is a static strategy where the deci-sion is made independent of previous decisions. In thissection, we discuss two dynamic strategies that achieveobjectives other than monitoring accuracy.

6.1 Mobility-Aware Update Strategy

Previously, we ignore the fact that the server receivesa series of location updates from an object. Althoughthe server cannot speculate the genuine object locationfrom an individual -square, by considering consecutiveupdates with certain background knowledge about theobjects mobility, the server might produce better specu-lations.


11/17

11

t0

t1

(a) Known Max Speed

direction

t0

t1

(b) Known Direction

Fig. 11. Problems with Mobility Knowledge

Fig. 11(a) and 11(b) show two examples where themaximum speed vm or the exact direction of the move-ment is known, respectively. In these examples, a -square is updated at time t0, then at time t1, the objectmust reside in the dotted shape, which is called thereachable area from t0. In Fig. 11(a), the reachable area isthe Minkoski sum of the -square at t0 and a circle witha radius ofvm(t1 t0), i.e., the -square expanded by thecircle at each point. Likewise, in Fig. 11(b), the reachablearea is the half-open space formed by the rays whose

ends are from the -square. If the -square at t1 overlapswith the reachable area, then the object can only locatein the part that is inside the area (shaded in Fig. 11(a)and 11(b)).

To prevent the server from narrowing down the objectlocation like this, we propose the following mobility-aware strategy.

Definition 6.1: Mobility-aware update strategy: Up-date when the centroid of -square is out of the saferegion and the -square is completely inside the reach-able area of all previous -squares.

The intuitive version of this strategy must maintainthe entire set of historic -squares. However, thanks toits dynamic property, we show in the following lemmathat it is sufficient to maintain only the reachable areafor the last -square.

Lemma 6.1: For a set of -squares of {t0, t1,...,tn} andi j n, the reachable area of tj is completely insidethat of ti as long as the -square of any ti is completelyinside the reachable area of ti1 (i 1).

In what follows, we develop an algorithm to testwhether a -square at t1 is completely inside the reach-able area of t0 when the direction is known in the range

of (l, h). This is a general case for Fig. 11(b). The ideais to take an analytic view of this area. As is illustrated inFig. 12, let o with coordinates (a, b) denote a point in the-square of t0, and let p with coordinates (x, y) denote apoint in the -square of t1. Then the condition that lineop falls in between direction (l, h) is equivalent to theinequality tg(l) (y b)/(x a) tg(h) (we consideronly the first quadrant for simplicity). Therefore, to testwhether the -square at t1 (xl x xh, yl y yh)is completely inside the reachable area of t0 is equiv-alent to testing whether any of the following two setsof inequalities with regard to x,y,a,b can be satisfied

direction

p(x,y)

o(a,b)t0

t1

Fig. 12. Reachable Area

saferegion

p0

pN

N0

idealsafearea

Fig. 13. Min-Cost Update Strategy: -Rule

simultaneously:

(x a)tg(l) + (y b) 00 a , 0 b xl x xh, yl y yh

and

(x a)tg(h) (y b) 00 a , 0 b xl x xh, yl y yh

Either of them can be regarded as the set of linear con-

straints in a linear programming (LP) problem regardingvariables x,y,a,b. We build two LP problems P1, P2 witha (dummy) objective function C = 0 and the same linearconstraints as above. Determining whether any of thetwo set of inequalities can be satisfied simultaneously isthen equivalent to testing whether P1 or P2 has a feasiblesolution. The feasibility can be tested by any LP solversuch as the classic Simplex or Ellipsoid method. The -square of t1 is completely inside the reachable area of t0only if neither P1 nor P2 is feasible.

6.2 Minimum-Cost Update Strategy

In previous sections, we use a rectangular safe regionto approximate the ideal safe area in which the changeof the centroid p of a -square does not change themost probable result of any query. Fig. 13 illustratesthe relation between a safe region and the ideal safearea. The gap between them is inevitable and could bearbitrarily large due to the following reasons: (1) a saferegion for an individual query is already a rectangularapproximation of the inner or outer region for this query;and (2) the whole safe region is obtained by intersectingthe safe regions for all individual queries, which makesit far smaller than the ideal safe area.

To guarantee 100% monitoring accuracy on the most

probable result, the standard strategy updates wheneverp moves out of the safe region, but this could be anunnecessary update as p might still be in the ideal safearea. We therefore believe that in applications where100% accuracy is not mandatory and location updatecosts are serious issues, a strategy that can trade accuracywith costs is desirable. In this subsection, we developsuch a strategy that tries to minimize the cost by addinga -rule to the standard strategy to filter out unnecessaryupdates.

More specifically, symbol is the probability of pmoving out of the ideal safe area. Let costp denote


12/17

12

the cost of not updating p in this case. As p causesa result change, costp is essentially the penalty of lossof monitoring accuracy. On the other hand, let costudenote the cost of updating p. Therefore, to minimize theexpected cost, the -rule updates only ifcostu < costp,i.e., > costu/costp.

To test whether at p is larger than costu/costpwithout sending it to the server, we need to knowan additional point p0 inside the ideal safe area (seeFig. 13). To find p0, we continue to use the standardstrategy. If the next updated location p causes no resultchanges (a feedback from the server), p is regarded as

p0. Otherwise, if p changes the result, the ideal safearea changes as well, so we continue to find p0 forthe new ideal safe area. In general, the -rule is onlyapplicable after two consecutive location updates by thestandard strategy, and the second update must cause noresult changes from the first update. This prerequisite isuseful in filtering out those ideal safe areas that are notsignificantly larger than their safe regions.

If we regard the space as a space of values, = 0when p is in the safe region and gradually increasesas p moves away from the safe region. As at anypoint is independent of the values at other points, themovement of from the border of the safe region to p can

be regarded as a discrete random walk for simplicity (seeFig. 13). Initially, at the border = 0, and by taking stepsof length , it walks away from the border towards p. Ineach step, is increased by with a probability , andnot increased with probability 1 . As such, the totalnumber of steps N = dist(p,R)/. Since the maximumvalue for is 1, = 1/N. In any step, if the -rule issatisfied, the rule must also be satisfied at p, because is

monotonously increasing as it moves. Thus, the strategyupdates the location and stops the walk in any step whenthe -rule is satisfied. On the other hand, if the -rule isnot satisfied till the last step, then the strategy does notupdate the location.

We are yet to estimate . As p0 is known to be insidethe ideal safe area, a random walk to p0 can be conductedin the same way as above. By the maximum likelihoodestimation, we should maximize the probability that at p0 does not satisfy the -rule, i.e., the probability of costu/costp. By the theory of Bernoulli process, at p0 follows a Binomial distribution whose cumulative

distribution function is bounded by exp(2 (N0N)2

N0).

The function reaches the maximum when N0 = N.Putting = costu/costp, we have =

costuN/costpN0

.The state transition diagram of this strategy is il-

lustrated in Fig. 14 where the shaded texts mean rulesatisfaction and unshaded texts mean otherwise. Thereare four states in this strategy: initial state, no up-date, first update, and second update. The -ruleis applicable only at second update and no updatewhere p0 is obtained. To transit to second update, theremust be two location updates, and the latter (regardedas p0) must cause no result changes. Once the strategydecides to update, the -rule is suspended and the state

NoUpdate FirstUpdate

SecondUpdate

stan

darda

ndno

chan

gesta

ndard

o

r-rule

standardor rule

standardand rule

stand

arda

nd

rule

InitialState

stand ard

standard

standard

andchange

standard

|

Fig. 14. State Transition Diagram

is reset to first update to wait for the next p0.

7 PERFORMANCE EVALUATION

To evaluate the monitoring performance, we implementa simulation testbed, where N moving objects movewithin a unit-square space [0..1, 0..1]. Each object detectsits point location at frequency f, encapsulates it into

a -square, and forwards the square to the locationupdater. Each object has an individual and it follows anormal distribution with mean value . We compare ourPAM framework with two other frameworks, namely,the optimal monitoring (denoted as OPT) and the periodicmonitoring (denoted as PRD). In optimal monitoring,every object has the perfect knowledge of the registeredqueries and the -squares of other moving objects atany time. Therefore, it knows precisely when the mostprobable result of any query changes, and only then doesit send a location update to the server. OPT serves as thelower bound for all monitoring frameworks. In periodicmonitoring, all objects periodically send out location

updates simultaneously and the server reevaluates allregistered queries based on these updates. Obviously, itsmonitoring accuracy and cost depend on the updatinginterval. In this paper, we test PRD with updating inter-vals 0.1 and 1, denoted as PRD(0.1) and PRD(1) hereafter.

7.1 Simulation Setup

In the simulation testbed, each object moves according tothe random waypoint mobility model: the client choosesa random point in the space as its destination and movesto it at a speed randomly selected from the range [0, 2v];upon arrival or expiration of a constant movement period

(randomly picked from the range [0, 2tv]), it choosesa new destination and repeats the same process. Thisis a well accepted and studied model in the mobilecomputing literature [5].

The workload consists of W queries, half of whichare range queries and half are kNN queries. For rangequeries, the query rectangle is a square and its sidelength is uniformly distributed in a range of [0.5qlen,1.5qlen]. For kNN queries, the query points are randomlydistributed and k ranges from 1 to kmax. In all thethree frameworks, the database server maintains an in-memory grid index (M M cells) on the queries and an


13/17

13

Parameter Default Value Parameter Default ValueN 100,000 objects W 1,000 queriesqlen 0.005 kmax 10f 0.001 per time unit 0.0005v 0.01 per time unit tv 0.005 time unitM 50

TABLE 1Simulation Parameter Settings

R*-tree index [2] on the objects. The database server issimulated on a Pentium 4 2.4GHz PC with 1GB RAMrunning WinXP SP2. Table 1 summarizes the defaultparameter settings.

To eliminate the effect from hardware configuration,the simulation uses logical time units instead of clocktime units. Each simulation run lasts for 5,000 timeunits or until the measured value stabilizes (for thosesimulations that take 12 hours or more).

The performance metrics for comparison include: Monitoring accuracy: The monitoring accuracy at

time t, ma(t) {0, 1}, is defined as whether the

monitored results for all queries accord with theresults from the OPT framework. Note that the latterare the most probable results based on the -squaresof all objects at t. The monitoring accuracy for a timeperiod [tb, te] is defined as the amortized accuracy

over time, i.e., ma(tb, te) =1

tetbtetb

ma(t)dt. Wireless communication cost: It is the amortized

number of location updates sent by a moving objectover time.

CPU time: This is measured by the amortized serverCPU time, which includes the time for query eval-uation and safe region computation.

7.2 Validity of Most Probable Result

The first set of experiments is to validate the definitionof most probable result. Under various (the mean of) and f (the location detection frequency), we comparethe most probable result from the OPT framework withthe genuine result (the result as if all the point locationswere known) for all W queries. Fig. 15(a) shows theconsistency rate, i.e., the proportion of time when thetwo results are the same. As or f increases, theconsistency rate drops. However, the curve is not linear:the drop becomes slower when and f become larger.As such, even when or f is very large, the consistencyrate is above 70%. This justifies our claim that the mostprobable result is a nice approximation of the genuineresult for monitoring tasks.

7.3 Overall Performance

The next set of experiments evaluates the overall perfor-mance of the three frameworks with default parametersettings. Fig. 15(b) shows the monitoring accuracy andcommunication cost (normalized by the cost of OPT).As is guaranteed, our PAM framework achieves 100%accuracy, while PRD gets only 80%90%. Obviously,

0.7

0.8

0.9

1

0 0.002 0.004 0.006 0.008 0.01

f(locationdetectfrequency)

ConsistencyR

ate

u=0.0001 u=0.0005

u=0.001 u=0.005

(a) Consistency of Most Probable Result

0.6

0.7

0.8

0.9

1

PAM OPT PRD(1) P RD(0.1)

Monito

ringA

ccuracy

0.1

1

10

100

PAM OPT PRD(1) P RD(0.1)

Comm.C

ost

(b) The Overall Performance

Fig. 15. Performance Evaluation

PRD(0.1) is more accurate than PRD(1) but the perfor-mance gap is less than 10%. Further, it is at the cost of10 times higher communication overhead. On the otherhand, the communication cost of PAM is much smallerthan PRD and remains close to OPT.

7.4 Effects of

In this subsection, we evaluate the effect of on theperformance. We vary (the mean of ) from 0 to0.01 and Fig. 16(a) shows the corresponding monitor-

ing accuracy. While PAM achieves 100% accuracy, theaccuracy of PRD(1) and PRD(0.1) drops significantly as increases. The drop is mainly caused by the increasingspatial vagueness introduced by the -square. However,the rate of the drop decreases as increases, which inturn verifies the fact that the most probable result isstable for even large -squares.

Fig. 16(b) shows the communication cost. The costfor OPT is almost the same for all settings, becausethe change of (and thus ) merely changes the queryresults, not necessarily the frequency of result changes.Similarly, PRD(1) and PRD(0.1) (not plotted) have con-stant costs of 1 and 10, respectively. On the other hand,

the cost of PAM consistently grows as increases, buteven for = 0.01, which is already large in practice,it still outperforms PRD(1) by more than 40%. Further-more, the rate of the increase drops as increases, whichindicates that the approximation ratio of the safe regionto the quarantine area becomes steady.

Fig. 16(c) shows the CPU cost of PRD and PAM.PAM increases faster than PRD(1) and PRD(0.1), becausethe increase for PAM arises from two aspects morelocation updates and more complex query re-evaluation(especially for kNN queries) whereas the increase forPRD arises only from the latter. Nonetheless, even for


14/17

14

0.7

0.8

0.9

1

0 0.0001 0.001 0.01

u

Monito

ringAccuracy

PAM OPT PRD(1) PRD(0.1)

(a) Accuracy

0

0.2

0.4

0.6

0.8

1

0 0.0001 0.001 0.01

u

Comm.C

ost


(b) Communication Cost

1

10

100

1000

0 0.0001 0.001 0.01u

CPU(sec/timeunit)

PAM PRD(1) PRD(0.1)

(c) CPU Cost

Fig. 16. vs. Monitoring Accuracy, Communication, andCPU Cost

= 0.01, PAM is still about 1/20 that of PRD(1). There-fore, we can conclude that PAM is robust and efficientfor various privacy settings.

7.5 Scalability

This subsection evaluates the scalability of all frame-works in terms of the servers CPU time and com-munication cost. Fig. 17(a) shows the CPU time whenthe number of registered queries (W) increases from10 to 1,000. PAM only increases by less than 10 times

because the grid-based query index filters out mostof the unaffected and irrelevant queries. However, forPRD(1) and PRD(0.1), the CPU time is linear to W, asthey need to reevaluate every query at each batch oflocation updates. When W = 1, 000, for one logical timeunit, the server needs 1.6 CPU seconds to monitor the100,000 moving objects using PAM, 53 seconds usingPRD(1), and 217 seconds using PRD(0.1). As PRD up-dates locations periodically, the high CPU cost imposeson it a maximum update frequency: in this example, theupdate frequency is at most once every 21.7 seconds.PAM has no such limitation. In terms of communicationcost, although PAM increases linearly with respect to W,

it is still less than double of OPT. All the above resultssuggest that PAM is robust under various W settings.

Similarly, we conduct simulations to vary the numberof objects (N) from 100 to 100,000. Fig. 18(a) shows thatthe CPU cost only increases by about 40 times whenN increases by 1,000 times, thanks to the R*-tree indexwhich is incrementally maintained. In contrast, PRD(1)and PRD(0.1) both increase at least linearly to N, as theyneed to build a new R*-tree at each update to reevaluateall queries. Similarly, Fig. 18(b) shows that the commu-nication cost of PAM only increases by about 200 timeswhen N increases by 1,000 times. This suggests that

0.1

1

10

100

1000

10 100 1000Num.ofQueries(W)

CPUTim

e(sec/unit)

PAM PRD(1) PRD(0.1)

(a) CPU Time

0.01

0.1

1

10

10 100 1000NumofQueries(W)

Comm.C

ost


(b) Communication CostFig. 17. Performance vs. Query Numbers (W)

0.001

0.1

10

1000

100 1000 10000 100000

Num.OfObjects(N)

CPUTim

e(s

ec/unit)

PAM PRD(1) PRD(0.1)

(a) CPU Time

0.01

0.1

1

10

100 1000 10000 100000Num.ofObjects(N)

Comm.C

ost



Fig. 18. Performance vs. Object Numbers (N)

although a denser object distribution makes safe regionsshrink, only a decreasing portion of objects affects thequarantine area of any query and hence the safe regionof any object. In summary, PAM is more scalable thanPRD in terms of CPU and communication cost.

7.6 Effects of Query Types

In this subsection, we study the performance of PAM onrange and kNN queries separately. We vary the averagequery length qlen of range queries and kmax the max-imum k of kNN queries. The communication costs areplotted in Fig. 19(a) and 19(b), respectively. We observe

that for any parameter setting, PAMs communicationcost is at most three times as much as that of OPT. Forrange queries, as qlen increases, the communication costof OPT always increases at a steady pace. However, thecommunication cost of PAM increases more slowly whenqlen is relatively small (at 0.001) or large ( 0.01). Thiscan be explained by the fact that when qlen is relativelysmall or large, the safe regions are determined more

by the home cell than by the relevant queries. Since thesize of a cell is fixed, the cost tends to saturate. On theother hand, for kNN queries, as kmax increases, the costsof both OPT and PAM grow steadily. Even so, PAMmanages to narrow the gap when kmax becomes larger.This suggests that for a heavy workload when resultschange frequently, the safe region achieves even betterapproximation to the ideal safe area.

7.7 Sensitivity of PAM

In this subsection, we study the sensitivity of PAM toother influential factors, namely, the average movingspeed (v) and the average constant movement period (tv)for the moving objects. Fig. 20(a) shows the communica-tion cost when v varies from 0.001 to 1 per logical timeunit. The costs of both PAM and OPT increase linearly as


15/17

15

0

0.1

0.2

0.3

0.4

0.5

0.001 0.01 0.1

qlen

Comm.C

ost

PAM OPT

(a) Range Queries

0

0.5

1

1.5

2

0 10 20

kmax

Comm.C

ost

PAM OPT

(b) kNN Queries

Fig. 19. Performance vs. Query Types

0.01

0.1

1

10

100

0.001 0.01 0.1 1

AverageSpeed

Comm.C

ost

(pertim

eunit)

20

40

60

Comm.C

ost

(perd

istanceunit)

PAM OPT PAM OPT

(a) v

0

0.2

0.4

0.6

0.001 0.01 0.1 1

Constant-movingInterval

Comm.Cost

PAM OPT

(b) tv

Fig. 20. Communication Cost vs. v and tv

v increases, because the time of an object staying in a saferegion is inversely proportional to v. To eliminate thiseffect, we also plot the communication cost per distanceunit on the secondary y-axis in the same figure, andobserve that this cost is independent ofv. In other words,the update cost of PAM is not heavily dependent on thespeed of object movement on a trajectory, but on thelength of the trajectory. The CPU time shows a similartrend and hence is not plotted. In Fig. 20(b), we also varytv from 0.001 to 1 time unit and find that it has little effecton the performance of PAM. As such, we conclude that

PAM is robust to various moving parameters.The next influential factor is the M M grid parti-

tioning of the query index. We vary M from 5 to 100and plot both the communication cost and CPU timein Fig. 21. The larger is the value of M, the smalleris the grid cell size. The communication cost increasesmonotonously with M because the grid cell sets thelargest possible safe region of an object. Nonetheless,the cost difference between M = 5 and M = 50 is notsignificant but there is a sharp increase from M = 50 toM = 100. The explanation is the same as in Section 7.6,which is, when M is small and moderate, the safe regionsare determined more by the relevant queries than by

the grid cell, but when M is large the cell becomesdominate. On the other hand, the CPU time decreasesmonotonously because the number of relevant queries inthe cell decreases and hence the safe region computationis faster. In this figure, M = 50 yields a fairly lowcommunication cost as well as a fairly low CPU time.From this experiment, we can see that it is advantageousto adapt the cell size to the servers workload: we firstuse a large M to partition the grid, and later if theworkload turns out to be low, we enlarge the cell size

by merging the cell where the update occurs with itsneighboring cells within a certain distance. In this way,

we can take full advantage of the CPU resource andreduce the communication cost (by enlarging the saferegion) as much as possible.

0

0.2

0.4

0.6

0.8

1

5 10 50 100

Num.ofCells(MxM)

Comm.C

ost

0

20

40

60

80

C

PU

Time(sec/unit)

Comm.Cost CPUTime

Fig. 21. Performance vs. Grid Partitioning

7.8 Dynamic Update Strategy

The last set of experiments is conducted to evaluate theminimum cost update strategy. We vary the thresholdfor the -rule, i.e., costu/costp from 0.01 to 1. Fig. 22(a)and 22(b) show the monitoring accuracy and commu-nication cost in comparison with the standard strat-egy. The two curves of PAM show a similar trend as increases, which means that through the strategy

effectively trades accuracy for communication cost, orvice versa. Interestingly, when 0.1, the decrease ofthe communication cost is accompanied by almost thesame decrease of accuracy; however, when > 0.1, theaccuracy drops more slowly than the communicationcost. This shows that most ideal safe area is far largerthan the safe region, so even an aggressive can stillkeep the object inside the ideal safe area.

8 CONCLUSIONS

This paper proposes a framework for monitoring con-tinuous spatial queries over moving objects. The frame-

work is the first to holistically address the issue oflocation updating with regard to monitoring accuracy,efficiency and privacy. We provide detailed algorithmsfor query evaluation/reevaluation and safe region com-putation in this framework. We also devise three clientupdate strategies that optimize accuracy, privacy andefficiency, respectively. The performance of our frame-work is evaluated through a series of experiments. Theresults show that it substantially outperforms periodicmonitoring in terms of accuracy and CPU cost whileachieving a close-to-optimal communication cost. Fur-thermore, the framework is robust and scales well withvarious parameter settings, such as privacy requirement,

moving speed, and the number of queries and movingobjects.

As for future work, we plan to incorporate other typesof queries into the framework, such as spatial joins

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1lambda

Monito

rin

gA

ccuracy

standard minimumcost

(a) Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.01 0.1 1lambda

Comm.C

ost

standard minimumcost


Fig. 22. Minimum Cost Strategy vs. Standard Strategy


16/17

16

and aggregate queries. We also plan to further opti-mize the performance of the framework. In particular,the minimum-cost update strategy shows that the saferegion is a crude approximation of the ideal safe area,mainly because we separately optimize the safe regionfor each query, but not globally. A possible solutionis to sequentially optimize the queries but maintainthe safe region accumulated by the queries optimizedso far. Then the optimal safe region for each queryshould depend not only on the query, but also on theaccumulated safe region.

REFERENCES

[1] S. Babu and J. Widom. Continuous queries over data streams. InProc. SIGMOD, 2001.

[2] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree:An efficient and robust access method for points and rectangles.In Proc. SIGMOD, pages 322331, 1990.

[3] R. Benetis, C. S. Jensen, G. Karciauskas, and S. Saltenis. Nearestneighbor and reverse nearest neighbor queries for moving objects.In Proc. IDEAS, 2002.

[4] A. Beresford and F. Stajano. Location privacy in pervasive

computing. IEEE Pervasive Computing, 2(1):4655, 2003.[5] J. Broch, D. A. Maltz, D. Johnson, Y.-C. Hu, and J. Jetcheva. A

performance comparison of multi-hop wireless ad hoc networkrouting protocols. In Proc. ACM/IEEE MobiCom, pages 8597, 1998.

[6] Y. Cai, K. A. Hua, and G. Cao. Processing range-monitoringqueries on heterogeneous mobile objects. In Proc. MDM, 2004.

[7] J. Chen and R. Cheng. Efficient evaluation of imprecise location-dependent queries. In ICDE, pages 586595, 2007.

[8] J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalablecontinuous query system for internet databases. In Proc. SIGMOD,2000.

[9] R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Querying im-precise data in moving object environments. TKDE, 16(9), Sept.2004.

[10] H. D. Chon, D. Agrawal, and A. E. Abbadi. Range and kNNquery processing for moving objects in grid model. ACM/Kluwer

MONET, 8(4):401412, 2003.

[11] C.-Y. Chow, M. F. Mokbel, and X. Liu. A peer-to-peer spatialcloaking algorithm for anonymous location-based services. InProc. of ACM GIS, pages 171178, 2006.

[12] D.Pfoser and C. Jensen. Capturing the uncertainty of moving-objects representations. In SSDBM, 1999.

[13] B. Gedik and L. Liu. MobiEyes: Distributed processing of contin-uously moving queries on moving objects in a mobile system. InProc. EDBT, 2004.

[14] B. Gedik and L. Liu. Location privacy in mobile systems: Apersonalized anonymization model. In ICDCS, pages 620629,2005.

[15] B. Gedik and L. Liu. Protecting location privacy with personalizedk-anonymity: Architecture and algorithms. IEEE Transactions on

Mobile Computing, 7(1):118, 2008.[16] G. Ghinita, P. Kalnis, and S. Skiadopoulos. Mobihide: A mobile

peer-to-peer system for anonymous location-based queries. InProceedings of SSTD, 2007.

[17] G. Ghinita, P. Kalnis, and S. Skiadopoulos. Prive: Anonymouslocation-based queries in distributed mobile systems. In Proc. ofWWW 07, pages 371380, 2007.

[18] M. Gruteser and D. Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. In MobiSys,2003.

[19] A. Guttman. R-trees: A dynamic index structure for spatialsearching. In Proc. SIGMOD, 1984.

[20] G. R. Hjaltason and H. Samet. Distance browsing in spatialdatabases. TODS, 24(2):265318, 1999.

[21] H. Hu, J. Xu, and D. L. Lee. A generic framework for monitoringcontinuous spatial queries over moving objects. In Proc. SIGMOD,pages 479490, 2005.

[22] G. Iwerks, H. Samet, and K. Smith. Continuous k-nearest neigh-bor queries for continuously moving points with updates. In Proc.VLDB, 2003.

[23] G. S. Iwerks, H. Samet, and K. Smith. Maintenance of spatialsemijoin queries on moving points. In Proc. VLDB, 2004.

[24] C. S. Jensen, D. Lin, and B. C. Ooi. Query and update efficientB+-tree based indexing of moving objects. In Proc. VLDB, 2004.

[25] D. V. Kalashnikov, S. Prabhakar, and S. E. Hambrusch. Mainmemory evaluation of monitoring queries over moving objects.Distributed Parallel Databases, 15(2):117135, 2004.

[26] P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventinglocation-based identity inference in anonymous spatial queries.TKDE, 19(12):17191733, 2007.

[27] A. Khoshgozaran and C. Shahabi. Blind evaluation of nearestneighbor queries using space transformation to preserve locationprivacy. In Proceedings of SSTD, 2007.

[28] H. Kido, Y. Yanagisawa, and T. Satoh. An anonymous commu-nication technique using dummies for location-based services.In Proc. of the Second International Conference on Pervasive Services(ICPS), 2005.

[29] M.-L. Lee, W. Hsu, C. S. Jensen, B. Cui, and K. L. Teo. Supportingfrequent updates in R-trees: A bottom-up approach. In Proc.VLDB, 2003.

[30] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong.TAG: A tiny aggregation service for ad-hoc sensor networks. InProc. USENIX OSDI, 2002.

[31] M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The new casper: Queryprocessing for location services without compromising privacy. InVLDB, pages 763774, 2006.

[32] M. F. Mokbel, X. Xiong, and W. G. Aref. SINA: Scalable incremen-

tal processing of continuous queries in spatio-temporal databases.In Proc. SIGMOD, 2004.

[33] G. Myles, A. Friday, and N. Davies. Preserving privacy in envi-ronments with location-based applications. Pervasive Computing,2(1):5664, 2003.

[34] J. M. Patel, Y. Chen, and V. P. Chakka. STRIPES: An efficient indexfor predicted trajectories. In Proc. SIGMOD, 2004.

[35] S. Prabhakar, Y. Xia, D. V. Kalashnikov, W. G. Aref, and S. E.Hambrusch. Query indexing and velocity constrained indexing:Scalable techniques for continuous queries on moving objects.IEEE Trans. on Computers, 51(10), 2002.

[36] K. Raptopoulou, A. Papadopoulos, and Y. Manolopoulos. Fastnearest-neighbor query processing in moving object databases.GeoInfomatica, 7(2):113137, 2003.

[37] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighborqueries. In Proc. SIGMOD, 1995.

[38] S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez.

Indexing the positions of continuously moving objects. In Proc.SIGMOD, 2000.

[39] Y. Tao, C. Faloutsos, D. Papadias, and B. Liu. Prediction andindexing of moving objects with unknown motion patterns. InProc. SIGMOD, 2004.

[40] Y. Tao, D. Papadias, and Q. Shen. Continuous nearest neighborsearch. In Proc. VLDB, 2002.

[41] Y. Tao, D. Papadias, and J. Sun. The TPR*-tree: An optimizedspatio-temporal access method for predictive queries. In Proc.VLDB, 2003.

[42] W. Wu, W. Guo, and K.-L. Tan. Distributed processing of movingk-nearest-neighbor query on moving objects. In ICDE, 2007.

[43] X. Xiong, M. F. Mokbel, and W. G. Aref. SEA-CNN: Scalableprocessing of continuous k-nearest neighbor queries in spatio-temporal databases. In Proc. ICDE, 2005.

[44] J. Xu, X. Tang, and D. L. Lee. Performance analysis of location-dependent cache invalidation schemes for mobile environments.

TKDE, 15(2):474488, 2003.[45] M. L. Yiu, C. S. Jensen, X. Huang, and H. Lu. Spacetwist: Man-aging the trade-offs among location privacy, query performance,and query accuracy in mobile services. In Proc. ICDE08, 2008.

[46] T.-H. You, W.-C. Peng, and W.-C. Lee. Protect moving trajectorieswith dummies. In The International Workshop on Privacy-AwareLocation-based Mobile Services, 2007.

[47] X. Yu, K. Q. Pu, and N. Koudas. Monitoring k-nearest neighborqueries over moving objects. In Proc. ICDE, 2005.

[48] J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L. Lee. Location-based spatial queries. In Proc. SIGMOD, 2003.


17/17

17

Haibo Hu is an Assistant Professor in the De-partment of Computer Science, Hong Kong Bap-tist University. Prior to this, he held severalresearch and teaching posts at HKUST andHKBU. He received his PhD degree in Com-puter Science from the Hong Kong Universityof Science and Technology in 2005. His re-search interests include mobile and wirelessdata management, location-based services, andprivacy-aware computing. He has published 20

research papers in international conferences, journals and book chapters. He is also the recipient of many awards,including ACM Best PhD Paper Award and Microsoft Imagine Cup.

Jianliang Xu is an associate professor in theDepartment of Computer Science, Hong KongBaptist University. He received the BEng de-gree in computer science and engineering fromZhejiang University, Hangzhou, China, in 1998and the PhD degree in computer science fromthe Hong Kong University of Science and Tech-nology in 2002. He was a visiting scholar inthe Department of Computer Science and Engi-neering, Pennsylvania State University, Univer-sity Park. His research interests include data

management, mobile/pervasive computing, wireless sensor networks,and distributed systems. He has published more than 70 technicalpapers in these areas, most of which appeared in prestigious journalsand conference proceedings. He currently serves as a vice chairman ofACM Hong Kong Chapter. He is a senior member of the IEEE.

Dik Lun Lee received the BSc degree in elec-tronics from the Chinese University of HongKong and the MS and PhD degrees in computerscience from the University of Toronto, Canada.He is currently a professor in the Departmentof Computer Science and Engineering, HongKong University of Science and Technology. He

was an associate professor in the Department ofComputer Science and Engineering, Ohio StateUniversity, Columbus. He was the founding con-ference chair for the International Conference on

Mobile Data Management and served as the chairman of the ACMHong Kong Chapter in 1997. His research interests include informationretrieval, search engines, mobile computing, and pervasive computing.

Documents

PAM_ an Efficient and Privacy Aware Framework for Continiously Moving Objects