29
Spatio-temporal compression of trajectories in road networks Iulian Sandu Popa & Karine Zeitouni & Vincent Oria & Ahmed Kharrat Received: 3 May 2013 /Revised: 3 March 2014 /Accepted: 1 April 2014 # Springer Science+Business Media New York 2014 Abstract With the proliferation of wireless communication devices integrating GPS technology, trajectory datasets are becoming more and more available. The problems concerning the transmission and the storage of such data have become prominent with the continuous increase in volume of these data. A few works in the field of moving object databases deal with spatio-temporal compression. However, these works only consider the case of objects moving freely in the space. In this paper, we tackle the problem of compressing trajectory data in road networks with deterministic error bounds. We analyze the limitations of the existing methods and data models for road network trajectory compression. Then, we propose an extended data model and a network partitioning algorithm into long paths to increase the compression rates for the same error bound. We integrate these proposals with the state-of-the-art Douglas-Peucker compression algorithm to obtain a new technique to compress road network trajectory data with deterministic error bounds. The extensive experimental results confirm the appropriateness of the proposed approach that exhibits compression rates close to the ideal ones with respect to the employed Douglas-Peucker compression algorithm. Keywords Spatio-temporaldatacompression . Lossycompression . Deterministicerrorbounds . Data models . Moving objects Geoinformatica DOI 10.1007/s10707-014-0208-4 I. Sandu Popa (*) University of Versailles Saint-Quentin and INRIA Paris-Rocquencourt, 45 avenue des Etats-Unis, 78035 Versailles, France e-mail: [email protected] URL: http://www.prism.uvsq.fr/~isap/ K. Zeitouni : A. Kharrat University of Versailles Saint-Quentin, 45 avenue des Etats-Unis, 78035 Versailles, France K. Zeitouni e-mail: [email protected] URL: http://www.prism.uvsq.fr/~zeitouni/ A. Kharrat e-mail: [email protected] V. Oria New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA e-mail: [email protected] URL: http://web.njit.edu/~oria/

Spatio-temporal compression of trajectories in road … - Spatio-temporal compression...Spatio-temporal compression of trajectories in road ... Therefore, our focus is on lossy spatio-temporal

  • Upload
    ngodien

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Spatio-temporal compression of trajectoriesin road networks

Iulian Sandu Popa & Karine Zeitouni & Vincent Oria &

Ahmed Kharrat

Received: 3 May 2013 /Revised: 3 March 2014 /Accepted: 1 April 2014# Springer Science+Business Media New York 2014

Abstract With the proliferation of wireless communication devices integrating GPS technology,trajectory datasets are becomingmore andmore available. The problems concerning the transmissionand the storage of such data have become prominent with the continuous increase in volume of thesedata. A few works in the field of moving object databases deal with spatio-temporal compression.However, these works only consider the case of objects moving freely in the space. In this paper, wetackle the problem of compressing trajectory data in road networks with deterministic error bounds.We analyze the limitations of the existing methods and data models for road network trajectorycompression. Then, we propose an extended data model and a network partitioning algorithm intolong paths to increase the compression rates for the same error bound. We integrate these proposalswith the state-of-the-art Douglas-Peucker compression algorithm to obtain a new technique tocompress road network trajectory data with deterministic error bounds. The extensive experimentalresults confirm the appropriateness of the proposed approach that exhibits compression rates close tothe ideal ones with respect to the employed Douglas-Peucker compression algorithm.

Keywords Spatio-temporaldatacompression.Lossycompression.Deterministicerrorbounds.

Datamodels . Moving objects

GeoinformaticaDOI 10.1007/s10707-014-0208-4

I. Sandu Popa (*)University of Versailles Saint-Quentin and INRIA Paris-Rocquencourt, 45 avenue des Etats-Unis,78035 Versailles, Francee-mail: [email protected]: http://www.prism.uvsq.fr/~isap/

K. Zeitouni :A. KharratUniversity of Versailles Saint-Quentin, 45 avenue des Etats-Unis, 78035 Versailles, France

K. Zeitounie-mail: [email protected]: http://www.prism.uvsq.fr/~zeitouni/

A. Kharrate-mail: [email protected]

V. OriaNew Jersey Institute of Technology, University Heights, Newark, NJ 07102, USAe-mail: [email protected]: http://web.njit.edu/~oria/

1 Introduction

The advances in positioning technology have made GPS technology more accurate with lowerpower consumption and cheaper in the last years. Consequently, GPS technology is frequentlyintegrated in many mobile devices with wireless communication capabilities such as mobilephones, PDAs, sensors, etc. This makes tracking and monitoring of moving objects (e.g.,vehicles in a road network, pedestrians in an airport, migratory animals or freight transporta-tion) easier and economically feasible. In this context, enormous volumes of spatio-temporaldata are already available (e.g., coming from delivery companies or public transportationagencies) and the number of applications with such data will increase in the coming years. Thetransportation domain is a typical example. Indeed, solutions on transportation planning, safetyor impact on the environment can be based on monitoring moving objects and analyzing theirtrajectories.

An important observation regarding spatio-temporal data is that trajectory datasets aregenerally large. For instance, a delivery company (e.g., the postal service) monitoring itsvehicle fleet can gather in a year a dataset of several hundred gigabytes of trajectory data,depending on the number of tracked vehicles and the position sampling rates. Therefore, it iseasy to understand the computational, transmission and storage challenges that are entailed inthis context. A solution to these problems can be found in trajectory compression techniques[24]. A few works have already proposed methods to compress the trajectories of movingobjects (MOs) [4, 18, 21, 24, 26].

The trajectory of an MO is continuous in its nature. Nevertheless, the recording of an MOtrajectory is done by sampling the MO positions at different moments in time. For example, atypical consumer GPS refreshes its location every two seconds. Hence, the actual recordedtrajectory is a finite sequence of time-stamped positions which approximates the originaltrajectory. Then, to simulate the continuous movement, interpolation or extrapolation tech-niques can be used on the discrete data representation. Linear interpolation [2, 4, 6, 7, 14, 21,24, 26, 31] appears to be the most widely adopted as it provides a good balance between therepresentation accuracy and the computational cost. However, other encoding techniques fortrajectory smoothing [5], such as cubic splines or clothoids [20], have been proposed.Intuitively, the higher the sampling rate is, the better the accuracy of the trajectory approxi-mation is. Yet most applications relying on MO trajectories do not require high fidelitytrajectory representations. The average error of a commercial GPS is in the order of a fewmeters, but most applications can be satisfied with (much) lower precision of GPS traces.Moreover, the precision needed by an application often decreases as the data gets older. Basedon these observations, a few techniques [4, 18, 21, 24, 26] have been proposed to intelligentlydiscard some data points from the original sampled trajectory that carry less information on thegeneral movement of the MO.

Another important observation is that in several situations the object movements areconstrained. For example, vehicles move on road networks and highways and trains moveon railroad networks. In the case of constrained MOs, the movement is represented withreference to the network space (hereafter called the constrained model [14]) instead of the two-dimensional space (hereafter called the 2D model [24]). This type of data representation hastwo main advantages. First, the constrained model allows for dimensionality reduction [29],which leads to a better storage and query performance than the 2D model. Second, the 2Dmodel is less accurate than the constrained model, since it limits the spatial representation ofthe trajectories to a linear interpolation between the reported positions, while in fact the MOsfollow the exact geometry of the transportation network [29]. This second aspect is even moreapparent in the context of compression, as the average length of the interval between two

Geoinformatica

consecutive reported positions increases due to the elimination of sampled data points.However, with one exception [18], the few proposed compression methods for trajectory dataonly consider the case of free movement.

In this paper, we focus on reducing the size of in-network1 trajectory datasets. Themain objective is to obtain a good compromise between the compression rate and theerror introduced by compression. Moreover, we are interested in approximations thatare bounded (i.e., that guarantee an upper bound on the introduced error) with respectto the trajectory connecting the original data points and can be parametrically adjust-ed. Therefore, our focus is on lossy spatio-temporal compression (STC) with deter-ministic error bounds. This will allow adjusting the compression rate and the qualityof the data to be presented to the application. Since we want to obtain the bestcompression rate for a given error bound, we focus on batch compression algorithmson historic trajectory data, i.e., the complete trajectory is known in advance. Note alsothat STC is different from the standard compression techniques (e.g., that are used tocompress text or binary files) mainly in two ways. First, the compression process isirreversible, i.e., the original data cannot be restored from the compressed data.Second, the compressed data can be queried directly, i.e., without decompression.

The existing techniques for two-dimensional STC can indeed easily be adapted tothe case of constrained movement. However, this will lead to very poor compressionrates due to the peculiar characteristics of the network space compared with theEuclidian space. The main difference consists in the discontinuity points of thenetwork space, i.e., when an MO moves from one road to another. This is illustratedin Fig. 1 that depicts a trajectory traversing three road junctions. The typicalconstrained model for trajectory data requires creating a new trajectory unit at eachroad junction [14]. Hence, the trajectory in Fig. 1 will have at least four data unitsregardless the accuracy of the representation. To point out the impact the discontinuityproblem has on compression, we compare in Fig. 2 the compression rates of a basicadaptation of an STC method to in-network compression with the optimal in-networkcompression and the 2D compression (cf. Section 6.2). All the represented STCmethods are based on the adaptation of the Douglas-Peucker algorithm. Note thatthe optimal compression considered here is also based on the Douglas-Peuckeralgorithm [8] and must not be confused with the compression obtained by using theoptimal compression algorithm [17] (see Section 2.1.2). Specifically, to compute theoptimal compression, we consider the most favorable case where a trajectory has nodiscontinuity, i.e. it is contained by a single road. Therefore, in the “optimal” case,the compression rate depends only on the ratio between the number of data points inthe compressed and the original representation. But when we take into account thetrajectory with discontinuity points, the compression rates are much lower (see the“basic adaptation of STC” curve in Fig. 2). This indicates that the typical data modelsfor in-network trajectories are inadequate to data compression. We propose twocomplementary solutions to solve the discontinuity problem as indicated in Fig. 1.We define an extended data model based on a generalized trajectory unit that canextend over several road segments. In addition, we propose a network partitioningalgorithm that aggregates the fine granular network edges into longer network pathsthat are shared among many trajectories and hence reduces the number of disconti-nuity points in the network.

1 For simplicity, we use the term “in-network” to denote a road network or any other type of transportationnetwork.

Geoinformatica

More specifically, the contributions of this paper are the following:

& We present a novel extended data model and a detailed analysis of the limitations of theexisting 2D compression algorithms. Our data model overcomes the problems of thestandard network data model in the context of in-network trajectory compression.

& We propose a method that partitions the road network into long paths based on the trajectorydata distribution to increase even more the compression rate for the same error bound.

& Based on the data model, the partitioning method and the state-of-the-art STC algorithm,we propose a compression method with deterministic error bounds and an error measurefor in-network trajectories.

& We validate our approach using an extensive experimentation demonstrating that the proposedcompression method is adapted to constrained trajectory data. Moreover, the proposed com-pression method offers higher compression rates, which are very close to the optimal compres-sion rates, with smaller errors than the reference non-constrained compression method.

The rest of this paper is organized as follows: Section 2 presents the related work. Section 3introduces the networkmodel as well as a generalized trajectory data model. Section 4 describesthe proposed algorithm for partitioning the road network into long paths. Section 5 introducesthe in-network compression method, which is based on the well-known Douglas-Peucker linesimplification algorithm, and the error measure. The experimental results are given in Section 6.Finally, we conclude and discuss some directions for future work in Section 7.

2 Related work

Several works deal with STC. In this section, we first present the methods that compresstrajectories of non-constrained MOs and then discuss the few methods that consider explicit orimplicit compression of constrained trajectories. Lastly, we present at the end of this section a

Fig. 1 Trajectory representationwith and without discontinuityover road segments

Fig. 2 Compression rates of opti-mal in-network, basic in-networkand 2D spatio-temporalcompression

Geoinformatica

few works dealing with trajectory clustering and trajectory based network generalization, sincethese works are related with the proposed network partitioning algorithm.

2.1 STC for non-constrained MOs

As we have already mentioned, most of the existing works that deal with STC consider objectsmoving freely in the Euclidian space. The techniques proposed earlier aim at improving either thecompression quality or the resources consumption (i.e., CPU and memory). The first categoryincludes algorithms for data reduction with deterministic error bounds. The algorithms are mostlyadapted to historical data, i.e., where the entire dataset is known in advance, but they can also beemployed to compress trajectory streams. However, the time complexity is O(N2), which mightbe prohibitive in certain cases. The second category comprises algorithms more adapted tosampling trajectory streams. Whereas these methods are resource-efficient (i.e., O(1) timecomplexity per location update), they offer no guarantee on the introduced compression error.

2.1.1 STC with error bounds

Meratnia and de By are among the first to tackle the problem of STC for MOs [24]. Theyproposed several extensions to adapt the existing methods to the STC based on the linegeneralization (or line simplification) techniques typically employed in computer graphics.The line simplification problem consists in approximating a polyline with a simplifiedpolyline, i.e., with fewer vertices. The line simplification typically refers to two classes ofproblems. The min-# problem is to minimize the number of vertices of the simplified polylineunder a given error bound. The min-ε problem is to minimize the approximation error for agiven number of vertices of the simplified polyline. This work falls in the first category of theline simplification problem, i.e., the min-# problem. In addition, the vertices of the simplifiedpolyline can be computed either as a subset of the original polyline vertices (i.e., sampling) oras distinct vertices from the original polyline. In this paper, we consider the samplingalgorithms.

Two classes of STC algorithms have been proposed in [24]: batch, i.e., for historical oroffline data, and online, i.e., for trajectory streams. The Douglas-Peucker (DP) algorithm [8] isthe most effective batch algorithm, whereas the open window methods seem to offer the bestresults for online compression [24]. Since the focus of this paper is on batch STC, we detail inSection 2.1.2 the DP algorithm and his adaptation to spatio-temporal data. Note that the openwindow algorithms are quite similar to the DP algorithm. The main difference is that thecompression is done in a window of variable size containing the last points in the data seriesand not on the entire trajectory. Hence, it is expected that the batch algorithms will consistentlyproduce higher quality results than the online methods.

A similar approach based on line generalization was taken in [4]. Nevertheless, they focusmore on the impact that the compression has on the errors of the query answers. They considerseveral distance measures, among which the Euclidian distance and synchronous distance (seeSection 2.1.2), in the line-simplification algorithm. Also, they analyze several query types,e.g., intersection, nearest neighbor, spatial join, and show that if the compression error isbounded, then the error of the query answer is also bounded. This also applies to our context.

2.1.2 The Douglas-Peucker algorithm and the synchronous distance

The DP algorithm takes as input a data series of 2D points and a user-defined threshold valueto output a subseries of the input data series. It first approximates the entire input data with a

Geoinformatica

line segment from the first to the last data point. Then, it measures the Euclidian distancebetween each intermediate data point and the line segment and retains the data point having themaximum distance. If the maximum distance exceeds the threshold value, then it uses this datapoint to split the time series in two subseries and recursively repeats the procedure for eachsubseries. The algorithm stops when the maximum distance in a subseries is lower than thepredefined threshold or the subseries contains only two data points. Figure 3 illustrates asimple case of the algorithm. The original data series contain seven points. The distances fromthe points {b,c,d,e,f} to the segment (a,g) are first computed. Since the maximum distance atpoint d exceeds the input threshold value, the data series are split at this point. In the leftsubseries, the distances from b and c to (a,d) are below the threshold value, so {a,d} aresufficient to approximate it. In the right subseries, the distance from f to (d,g) also exceeds thepredefined threshold, hence a new split is performed at this data point. The output of thealgorithm is the subseries {a,d,f,g}.

The straightforward implementation of the DP algorithm has a worst-case runningtime of O(N2), where N is the number of data points in the original data series.Several optimization of the algorithm have been proposed [13, 16] that improve theworst-case running time to O(N log N) and O(N logk N) (where k ∈ {2,3}). Note thatthe optimal line simplification algorithm [17] has a running time of O(N2) in thenaive implementation. Nonetheless, Imai and Iri describe in [17] realizations of theoptimal algorithm with worst-case complexities of O(N2 log N) and even O(N log N).

Line generalization algorithms such as DP only consider the spatial properties ofthe data. Yet trajectory data have an important extra dimension which is time. Caoet al. defined in [3] several distance functions between the spatio-temporal locationsof trajectories. Among these distance functions, the time-uniform distance considersboth the spatial and the temporal dimensions. Based on this distance, Meratnia and deBy introduced the notion of time ratio distance or synchronous distance [24]. This isillustrated by Fig. 4. The segment (Ps,Pe) represents the approximation of the trajectory between(ts,te) obtained by linear interpolation. For each point Pi at ti ∈ (ts,te) in the original data series,the distance to (Ps,Pe) is equal to the length of (Pi,Pi

′), where Pi′ is the point at ti on the

approximation trajectory. Then, the Euclidian distance is replaced with the synchronousdistance in the line simplification algorithms, which leads to much smaller errors with only amarginal loss in the compression rate.

We employ the synchronous network distance (introduced in Section 5) combined with theDP algorithm to compress in-network trajectories with deterministic error bounds. Whereasthis is sufficient to obtain a compression algorithm for in-network trajectories, this simpleadaptation will only lead to low compression rates (see Fig. 2). The reason is that the typicaldata models [7, 14, 29] for in-network trajectories impose that a new data point be created eachtime the MO changes the road. This means that the number of points in a trajectory cannot gobelow the number of traversed roads no matter the magnitude of the accepted error in thecompression.

Fig. 3 Douglas-Peuckeralgorithm

Geoinformatica

2.1.3 Sampling trajectory streams

The compression algorithms that are based on line-simplification techniques can be adapted toonline compression as proposed in [24]. However, the time complexity of their algorithm isO(N 2), where N is the number of points in the data series. Although they can still be employedfor relatively short data series, these techniques might not scale well with very large datastreams.

Potamias et. al proposed in [26] two efficient methods to compress on-the-fly (multiple)trajectory streams. The resulting stream can have either a fixed or a variable size. The idea is touse the last two stored positions of a trajectory stream to compute a velocity vector. Then,based on the velocity vector and two user-specified thresholds, i.e., indicating the tolerablechanges in speed and velocity vector orientation, they compute a safe zone for each newincoming data point in the stream. If the data point is in the safe zone, it is discarded since thepoint can be predicted from the current movement pattern. Otherwise, the point is stored in thetrajectory sample. The advantage of such an approach is that it has constant time complexity,i.e., O(1), for each position update. The trade-off is in the compression quality and in the factthat the introduced errors do not have deterministic bounds.

Also, Trajcevski et. al show in [31] that an online data reduction mechanism withdeterministic error bounds is possible if an agreement (called dead-reckoning policy) is madebetween the MO and the data server regarding the updates transmitted by the MO. In the dead-reckoning policy, the MO sends updates in the form of (location, time, velocity) to the server.Based on the last update, the server estimates the position of the MO at a future point in time.A new update is issued only when the actual position of the MO deviates from its estimatedlocation by more than δ, i.e., a threshold value specified in the agreement. More recently,Lange et al. proposed in [21] a class of methods based on dead-reckoning to efficiently trackthe trajectory of MOs. Their methods allow a trade-off between the tracked data accuracy andthe communication cost between the MO and the server. Again, only the case of free MOs isconsidered in their works.

2.2 STC for in-network MOs

To the best of our knowledge, the works presented in [18, 27] are the only ones to considerexplicitly the problem of in-network trajectory compression. Kellaris et al. [18] proposed twoapproaches to solve the problem of trajectory compression for constrained MOs. The firstapproach is to consider the 2D representation of the trajectory instead of the networkrepresentation, and then to use the adapted DP algorithm for free MOs proposed in [24] toobtain a compressed trajectory. Recall that the DP algorithm produces a subseries of the initialdata series. Hence, the points in the resulted compressed trajectory can be remap-matched tothe network space. Thus, one obtains a compressed in-network trajectory. This solution is notappropriate since it uses for compression the synchronous Euclidian distance (Fig. 4) between

Fig. 4 Synchronous distance

Geoinformatica

trajectory points and not the network distance. Since the error bound is computed with respectto the 2D space and not to the network space, there are no guarantees on the error bound in thenetwork space.

The second approach proposed in [18] is to replace different parts of a trajectory with theshortest paths in the network between the start point and the end point, if the actual paths aredifferent from the shortest paths. A compressed trajectory is then obtained as the shortest pathsrequire less storage space. Again, this type of approach is not satisfactory for at least tworeasons. It modifies through compression the actual path of a trajectory. Moreover, it offers noguarantees on the compression quality, i.e., no error bounds.

More recently, the concept of semantic trajectory compression was introduced in [27].Based on the observation that most human mobility occurs in transportation networks, Richteret al. [27] exploit the high-level semantic annotation of the transportation network (e.g., streetnames or bus, tram and train lines) to represent and compress the trajectories. As with themethods proposed in [18], the semantic compression does not guarantee an error bound. Inaddition, the compressed data has to be decompressed before applications can carry out theircomputations.

We also mention the work of Cao and Wolfson [2] that proposed an interesting alternativefor in-network trajectory representation, called non-materialized trajectory. Although a non-materialized trajectory minimizes the trajectory representation by considering the transportnetwork, the focus in [2] is more on map-matching and spatio-temporal querying with errorbounds, than STC.

2.3 Trajectory based network generalization

Since the constrained model is defined by reference to the network space, the trajectorycompression is widely impacted by the network representation. To address this problem, weaggregate the fine granular network edges into longer network paths that are shared amongmany trajectories. To underline the particularity of the proposed method that computesnetwork paths (see Section 4), we discuss succinctly in this section several works that arerelated to computing network paths either based on a dataset of trajectories or directly on thegeometric representation of the network.

A network path containing many (parts of) trajectories can be regarded as a cluster of sub-trajectories. Therefore, one may legitimately ask if the existing solutions dealing with thetrajectory clustering can be applied in our context. Many works consider the problem ofclustering a dataset of trajectories. Lee et al. [22, 23] tackle the problem of trajectory clusteringby proposing a two-phase approach, i.e., a partitioning phase and a grouping phase. A keyobservation in [22, 23] is that grouping sub-trajectories is more useful to many applicationsand may also lead to better clustering results than grouping trajectories as a whole. Thus, eachtrajectory is partitioned first into representative trajectory line segments. Interestingly, theproposed trajectory partitioning algorithm uses the minimum description length principle tofind the best tradeoff between the preciseness and the conciseness of the trajectory partitioning,which is similar to some degree to finding the best compressed trajectory that minimizes thecompression approximation error [18]. Then, the obtained sub-trajectories are clustered byusing a well-known density-based clustering method, i.e., DBSCAN [10], and a specificdistance function between two trajectory line segments. However, taken in our context, thesetrajectory clustering algorithms suffer from a major drawback, since they are designed fortrajectories of free moving objects (e.g., the movements of animals, vessels or hurricanes) thatare represented in the Euclidian space. One idea would be to adapt these algorithms bytransforming the 2D polyline representation of the trajectories into a one dimensional, straight

Geoinformatica

line representation [25]. However, this type of space transformation is superfluous if thetrajectories are represented directly in the network space. Moreover, a space transformationdoes not reflect the important information that is captured by the network topology. Therefore,a more effective clustering can be obtained by methods adapted to constrained trajectories.

A few works [15, 19, 28] consider the problem of clustering in-network trajectories. Basedon the observation that the spatial proximity in the Euclidian space is not the same with theproximity in a network space, NNCluster [28] proposes a distance measure between trajecto-ries that reflects the road network proximity, i.e., by using the shortest path between twonetwork locations. Then, clusters of closely located trajectories are computed with a typicalclustering algorithm. In spite of considering in-network trajectories, NNCluster cannot beapplied in our context since it groups whole trajectories that are not necessarily overlapping,thus producing clusters that cover network sub-graphs rather than paths in the roadnetwork. NETSCAN [19] and NEAT [15] propose a different approach to cluster in-network trajectories. The idea is to build first the dense paths, i.e., the paths that havethe highest trajectory flows in the road network, and then to cluster the sub-trajectories that intersect these paths. Specifically, a dense path construction starts atthe most dense road segment in the network, which is extended at both ends with theadjacent road segments that maximize the trajectory flow between the segments untilthe density flow is under a threshold value. Then, new dense paths are discoveredfollowing the same principle. The process stops when all the road segments have beenvisited or the remaining segments have lower flow densities than the thresholdparameter. The obtained dense paths represent the clusters. Besides, NEAT has anadditional cluster refinement phase in which the dense paths located within a certainnetwork distance are merged into a larger cluster. Also, it is worth mentioning that in[15] it is shown that NEAT is much more effective and efficient than TRACLUS [23]in finding in-network trajectory clusters, which indicates the importance of having anappropriate data representation model and algorithms for constrained trajectories.However, NETSCAN and NEAT are designed to discover only the dominant pathsin the network. These dense paths cover typically only a small part of the networkedges and do not produce a global network partitioning. Therefore, many sub-trajectories in the dataset do not intersect these dense paths and cannot benefit ofincreased compression rates.

The network partitioning algorithm proposed in this paper uses a dataset of trajectories tocompute the network transformation. Alternatively, a route-oriented network transformationcan be obtained directly from the geometric representation of the network and independentlyof any dataset. Indeed, the idea of creating long routes to decrease the number of roads in thenetwork is not new. In [6] the routes are created by concatenating the edges having the samestreet code. Also, based on the observation that the MOs tend to move as straight as possibletowards their destinations, it would be useful to extend the road with the segment having thesmallest angle among the candidate segments. Specifically, the longest segment in the networkis extended at both ends with the segment having the smallest angle (with the current segment)among the candidate segments. The extension of the current path ends when the minimumangle between the candidate segments and the end segment of the current road exceeds 90° orthere are no more candidate segments. The advantage of this type of approach is that theresulting partitioned network can be used in combination with any trajectory dataset since onlythe network topology and geometry are considered to partition the network, whereas the datadistribution and density are disregarded. However, we expect this approach to be less effectivefor the compression rate than the proposed method, since it cannot adapt to the actualmovement patterns.

Geoinformatica

3 A generalized in-network trajectory data model

The constrained movement requires specific data representations and specific querymodels [14, 29]. As indicated earlier, it is important that the data representation berelated to the network space instead of the 2D space in the case of constrainedmovement. This section presents the network and the data models for in-networktrajectory compression.

3.1 Network model

Several network models have been proposed to represent and index the movements ofconstrained MOs, e.g., [14, 29, 30]. In this paper, we adopt a network model similar to themodel proposed in [14, 29], which is the basis for modeling in-network MOs. Mostly, thenetwork model is based on two representations of the road network: a geometric view and atopological view. The geometric view captures an approximate geometric description of thenetwork, i.e., each road is represented as a polyline in the Euclidian space. The topologicalview uses a graph to capture the intersections between roads.

Formally, a road network is an undirected graph G = (V,E), where V is a set of vertices andE⊆V×V is a set of edges. Each v∈V corresponds to a road junction having the geometriccoordinates (xv,yv) in the 2D space. An edge e∈E connects two nodes e=(v1,v2). The edge e isassociated with a 2D polyline in the geometric representation, i.e., a sequence of 2D points ⟨v1,iv1,iv2,…,ivk,v2⟩, where each two consecutive points are connected by a line segment, ivi is anintermediate polyline vertex and k≥0.

Definition 1 Network models: For a given network G=(V,E), we define two possible networkgranularities that can be superimposed to a road resulting in different network models:

& Edge-oriented network model: each edge e∈E represents a distinct road in the network,i.e., Road=(rid,ei,start), where rid is a unique identifier, ei∈E and start∈V is one of thetwo end-nodes of the edge ei.

& Route-oriented network model: the complete roads are considered without split. A road is aset of connected edges that form a non-self-intersecting path in the network, i.e.,Road=(rid,Se,start), where Se={e1,…ek|ei∈E} and start∈V is one of the two end-nodesof the graph path Se. There is no overlap between the roads, i.e., ∀i≠j,Sie∩Sj

e=∅.

Definition 2 Network mapping function: We denote by φ:E→Ν the mapping function thatassociates each edge e∈E to a road identifier rid, where rid is a natural number.

Note that the number of roads is less than (e.g., route-oriented network model) or equal to(e.g., edge-oriented network model) the number of network edges. A typical bijective functionfor the edge-oriented network model is φ0:E→{1,2,…, |E|} with φ0(ei)=i.

Figure 5 depicts the geometric representation of a simple network. In the edge-orientednetwork model (Fig. 5(a)), there are seven roads corresponding to the seven edges, whereas inthe route-oriented model (Fig. 5(b)), there are only four roads. Hereafter we will employ thegeneral term of road to denote a road in a network modeled as one of the two possible networkmodels. When necessary, we will indicate the specific network model for the given network.

Definition 3 Network position: Given a road network G=(V,E) and a mapping function φ, aposition (location) in the network space is defined by the pair (rid,pos), where rid is a road

Geoinformatica

identifier and pos∈[0,1] is the relative position on the road measured from the start end-nodeof the road.

The definition of a position in the network space is based on the concept of linearreferencing widely used in GIS for transportation. The actual (physical) distance betweentwo network locations is computed using the geometric network representation.

3.2 Data model

The trajectory of a constrained MO consists of a sequence of in-network locations reported atdifferent moments in time, i.e., (moid,rid,pos,t), where moid is an MO identifier and t is a timeinstant. Different policies can be employed for an update. For example, the location can beupdated at a regular frequency, or when a sudden change in speed occurs, or when the MOtraverses a road junction. We consider the general case where the interval between twoconsecutive updates is variable.

3.2.1 Basic data model

In order to have a continuous view of the trajectory, a linear interpolation is used to determinethe position of the MOs at any time instant in the interval between two consecutive updates[14]. Thus, the so-called unit representation adopted in the moving object databases area isused to represent a trajectory. Hence, an MO trajectory is defined as a sequence of trajectoryunits, i.e., Tr={moid,⟨(rid1,pos1

1,t11,pos2

1,t21),…,(ridm,pos1

m,t1m,pos1

m,t2m)⟩} where pos1

i and pos2i

give the relative positions on the road at the two time instants t1i and t1

i . Note that pos1i ≤pos2i or

pos1i ≥pos2i depending on the unit orientation.

Observation 1: Given a 2D (road) network, the network space is often referred to as a 1.5-dimension space in the literature. In a network space, there is a continuous dimension, i.e., therelative position on a given road, and a discrete dimension, i.e., the set of roads in the network.A convention of the constrained data model is that a new trajectory unit is created each timethe MO passes from one road to another. Thus, the road junctions represent a kind ofdiscontinuity points in the network space. This is different from the Euclidian space whereeach spatial dimension is continuous.

The classical data model summarized above is not adapted to STC. The major limitationresults from the discontinuity points in the network. Let us consider a trajectory that extendsover several network roads (Fig. 6). A trajectory compression algorithm such as the linegeneralization (Section 2.1.2), works by eliminating intermediary trajectory points. In the datamodel based on trajectory units, the elimination of a data point results in merging two adjacentunits. However, since a new trajectory unit needs to be created at each road change, the datapoints situated at road junctions cannot be eliminated through compression.

Fig. 5 Example of a network a in the edge-oriented model and b in the route-oriented model

Geoinformatica

3.2.2 Generalized data model

To deal with this important limitation, we propose in this section an extended data model moreadapted to trajectory compression. In the following, we consider that the spatial projection of atrajectory consists of a single network path, i.e., there are no gaps in a trajectory. Nevertheless,extending the data model to trajectories with gaps is straightforward by modeling a trajectoryas a finite set of continuous trajectory components.

Definition 4 Road segment: Given a road network G=(V,E) and a mapping function φ, a roadsegment (road portion) is defined as s=(rid,pos1pos2), where rid is the road identifier and {pos1,pos2}∈[0,1] are two relative positions on the road measured from the start end-point of the road.

Definition 5 Generalized unit: Given a road network G=(V,E) and a mapping function φ, ageneralized (trajectory) unit is defined as gu=(Ss,t1, t2), where Ss=⟨s1,…,sk⟩ is a finitesequence of connected road segments and ∀si,si+1(1≤i≤k−1) ridi≠ridi+1 and t1,t2 indicatethe time interval of the generalized unit.

Definition 6 Trajectory model: Given a road network G=(V,E) and a mapping function φ, anMO trajectory is defined as Tr={moid,⟨gu1,…gum⟩}, where moid is the MO identifier and⟨gu1,…,gum⟩ is a sequence of connected generalized units.

Note that the above definitions are independent of the network model, i.e., edge or route-oriented. Note also the two observations below about the introduced model.

Observation 2: The data model based on generalized units eliminates the discontinuityproblem, since a trajectory unit can extend over road junctions. In addition, the extended datamodel includes the classical data model, because a base data unit can be viewed as a particularcase of a generalized data unit. One might think that a generalized unit storing only the startand end locations (similar to the base trajectory unit) is sufficient to represent a trajectory.Whereas this type of approach is beneficial for the compression rate, it raises nonetheless someproblems. First, each time a trajectory is queried, we need to retrace the units extending overdifferent roads (i.e., decompression). Finding the path between two network positions is a verycostly process. Moreover, because there may be several possible paths between two locations,the data model can no longer guarantee the exactness of the trajectory path. Second, the indexmethods for in-network trajectories [7, 29] needed to optimize the query processing, require anexplicit representation of the indexed data.

Observation 3 Given a road network in the edge-oriented or route-oriented represen-tation and a generalized unit gu with Ss= ⟨s1,…,sk⟩, then ∀si, (2≤ i≤k−1), the relative

Fig. 6 Classical versus generalized in-network trajectory data model

Geoinformatica

positions {pos1,pos2} of each road segment si can be easily inferred from the networktopology. Likewise, one can obtain pos2∈s1 and pos1∈sk.

Hence, a generalized unit can be summarized as follows:

Definition 7 Summarized generalized unit: Given a road network G=(V,E) and a mappingfunction φ, a summarized generalized (trajectory) unit is defined as sgu=(pos1

1,⟨rid1,rid2,…,ridn⟩,pos2

n,t1,t2), where pos11 and pos2

n indicate the relative position on rid1 at t1 and ridn at t2respectively.

Definition 8 Summarized trajectory model: Given a road network G=(V,E) and a mappingfunction φ, a summarized MO trajectory is defined as STr={moid,⟨sgu1,…,sgum⟩}, wheremoid is the MO identifier and ⟨sgu1,…,sgum⟩ is a sequence of connected summarizedgeneralized units.

The use of the summarized trajectory representation will lead to an increase of thecompression rate. At the same time, processing a summarized trajectory (e.g., query process-ing) will incur an overhead since the trajectory has to be “decompressed”, i.e., the relativepositions of the road segments need to be recomputed.

Figure 6 depicts an example of an MO trajectory representation using both the classical datamodel [14] and the proposed generalized data model. The road network in the example isedge-oriented. The trajectory extends between the network locations A and F. Also, we assumethat the MO moves at constant speed between A and D and with a different speed between Dand F. The classical data model requires five units to represent the trajectory, i.e., Tr={moid,⟨(e7,0.5,tA,1,tB),(e5,1,tB,0,tC),(e1,1,tC,0.7,tD),(e1,0.7,tD,0,tE),(e4,0,tE,0.6,tF)⟩}, as a new unitis created each time the MO changes the road (edge) or its speed. The generalizeddata model requires only two units in this example, i.e., Tr={moid, ⟨(⟨(e7,0.5,1),(e5,1,0), (e1,1,0.7)⟩, tA, tD), (⟨(e1,0.7,0), (e4,0,0.6)⟩, tD, tF)⟩} according to the Definition 6 orTr={moid, ⟨(0.5,⟨e7,e5,e1⟩,0.7, tA, tD), (0.7,⟨e1,e4⟩,0.6, tDtF)⟩} in its summarized version(cf. Definition 8). Compared to the basic data model, the generalized data model hasthe advantage of eliminating the time information when it is not required in thetrajectory representation. In addition, in its summarized version, the generalized modelalso removes the position information, which leads to more compact trajectoryrepresentations.

4 Network partitioning into network paths

As indicated in Observation 1, a peculiar feature of the network space compared to theEuclidian space is represented by the discontinuity points. Therefore, given a trajectory unit,we need to indicate all the traversed road segments between the two network locationsdelimiting the unit segment. That is not the case for non-constrained MOs where a 2D linesegment is implicitly built between two consecutive data points.

While a compression algorithm decreases the number of units of a trajectory by removingsome data points, the compression rate can increase with the reduction of the resulted trajectoryunit representation sizes. In this section, we propose an algorithm that partitions the networkinto long paths to obtain an additional compression margin without increasing the error bound.The idea is to decrease the fragmentation of the network space with respect to a set oftrajectories that traverse the network. Hence, the network partitioning algorithm presentedhereafter couples the data model presented in Section 3, which represents individual

Geoinformatica

trajectories, with the dataset of trajectories. It offers a way of computing an appropriate route-oriented network representation that is beneficial for the compact representation of a dataset oftrajectories.

Observation 4: Storing additional information beside the two network locations for a trajec-tory unit entails lower data compression rates. Note that if a trajectory is contained by a singleroad, then each trajectory unit has only one road segment no matter the unit length. Therefore,no additional information is needed in this case. In general, the lower the number of traversedroads is, the less additional information we need to store. One idea would be to partition thenetwork into long roads hopping to minimize the number of the traversed roads for a giventrajectory. This is possible with a route-oriented network model.

Based on Observation 4 we formulate the following optimization problem. Given a roadnetwork G=(V,E), the mapping function φ0 (see Definition 2) and a dataset of trajectories

D={Tr1,Tr2,…,Trn}, find a mapping function φopt such as the indicator I ¼ ∑i¼1

n

roads Trið Þ is

minimum, where roads(Tri) is the number of roads traversed by trajectory Tri. This comesdown to finding the partitioning of the network graph edges into several disjoint partitions suchthat the edges in each partition form a non-self-intersecting path in the network and I isminimum.

The graph partitioning problem is NP-complete [12]. Besides, as in the optimizationproblem, the graph partitioning problem is associated to an objective function (e.g., partitionthe graph in n connected parts such that the partitions are balanced w.r.t. the number of edgesin each partition). Therefore, heuristics are necessary to obtain good quality approximatesolutions at a reasonable cost. To the best of our knowledge, there is currently no solutionfor the graph partitioning problem we formulate above. Therefore, we propose a greedyalgorithm (Algorithm 1) for computing the mapping function φ. Although not optimal, themapping function decreases the value of the factor I near to the minimum value and itscomputation has low complexity. The idea is to use the spatial distribution of the trajectorydata D to compute network paths that are shared among several trajectories. The objective is tomaximize the length of the network paths, which leads to a lower number of discontinuities inthe network. Hence, the algorithm will start with the densest edges in the network(line 6 in Algorithm 1). That is, given a network edge ei∈E, we find the list oftrajectories Li={Tr1

i ,Tr2i ,…,Trp

i } that intersect ei. Then, we create a path by connectingan adjacent edge ej∈E to the current edge. Note that only the edges that areconnected to the path extremities are considered (line 7 in Algorithm 1). Since in thegeneral case there are several adjacent edges to ei, we choose the edge ej that has the maximumvalue of |Li∩Lj|, i.e., the path that maximizes the number of shared trajectories. Li∩Lj isassociated to the current path (line 8 in Algorithm 1). The process continues until there is noadjacent edge to the current path that shares at least one trajectory with the path. Also, as anedge can belong to a single path (i.e., there is no overlapping between the paths cf. Definition 1),the concatenated edges are eliminated from the edge list (line 10 in Algorithm 1).

The time complexity of Algorithm 1 is linear with regard to the number of units inthe trajectory dataset and loglinear with regard to the number of network edges, i.e.,Θ(|E| log |E|+ |units(D)|). Similarly, the space complexity of the algorithm is Θ(|E|+ I),where I is computed as above. Note that the focus of this paper is on compressinghistorical datasets and the partitioning obtained with Algorithm 1 is optimized for astatic dataset. When compressing trajectory streams, the algorithm can still be used topartition the network based on representative historical data, since MO trajectoriestend to follow the same spatial distribution [29].

Geoinformatica

Algorithm 1: Computing Network Paths

Input: Network graph ( )G V,E , mapping function 0 , trajectory dataset D

Output: Road partitioning function , transformed trajectory dataset D'

1. Priority Queue PI

2. for i 1 to E do

3. Compute i 1 2L , .,..,i i ipTr Tr Tr , i

k iTr e

4. . ( , )i iPI insert e L

5. while not . ()PI empty do

6. Create new path . ( )ipP PI extractMax e , p iL L

7. while

( . . ) ( )j j p j p j pe e P start e P end L L do

8. Find je that maximizes p jL L

9. .p p jP P e , p p jL L L

10. . ( )jPI delete e

11. 1p p

12. Given 1 2, ,..., mPaths P P P compute , transform D to D'

Figure 7 is a simple example of the way Algorithm 1 functions. The algorithmstarts by building a path at edge e1, considering that e1 has the maximum number oftraversing trajectories. The candidate list at this point is ⟨e2,e5,e3,e4⟩. The algorithmpicks e5 to extend the current path as {e5,e2} maximize the number of traversingtrajectories. A new candidate list is built, i.e., ⟨e7,e6,e3,e4⟩ and similarly, e4 is chosento extend the path. Finally, the extension stops after e7 is added to the current path, asthere are no more adjacent edges. Also, {e2}, {e3} and {e6} will form three differentone-edge paths. We obtain a route-oriented network representation that reduces thenumber of traversed roads for the trajectories in the input dataset. It is worth noticingthat the network path construction algorithm uses only the spatial dimension of thetrajectories in the dataset and ignores their temporal dimension, which is logical sincethe objective of the algorithm is to alleviate the space discontinuity problem of thenetwork space. Also, it is worth noticing another important difference between theproposed algorithm and the two path extension clustering algorithms [15, 19] present-ed in Section 2.3. The clustering algorithms consider only the trajectory flow at the

Fig. 7 Example of network pathconstruction

Geoinformatica

extension point and not the individual trajectories composing the flow. Differently, ouralgorithm extends the path with the edge that maximizes the number of commontrajectories with the current path (line 8 in Algorithm 1), since the objective is tominimize the global number of discontinuity points in the trajectory set and not todetermine dense paths.

5 In-network STC

Given the above introduced network and data models (Section 3) and the partitioningalgorithm that allows to transform the network space (Section 4), we introduce in thissection a compression method with deterministic error bounds for in-network trajec-tories. The method is based on a well-known line simplification algorithm (i.e.,Douglas-Peucker [8]) and the synchronous network distance (defined below). Theproposed compression method is independent of the employed network and datamodels. However, as shown in Section 6, both the proposed network and data modelsare essential for achieving high compression rates and thus complementary to the in-network STC. Note also that the proposed data model and network partitioningmethod represent alternative solutions that can be used together with the compressionmethod. However, the maximum benefit will be obtained when these solutions arecombined.

Let us denote by Tror an original trajectory and by Trapp the corresponding approximatetrajectory. Given a trajectory Tror=⟨gu1,…,gun⟩ (cf. Definition 6) in a road network (G,φ) anda user-defined threshold th, a compression algorithm with deterministic error bounds computesan approximate trajectory Trapp=⟨gu1,…,gum⟩, m≤n, such that the synchronous networkdistance (defined below) between each point of Tror and Trapp is less than th. For simplicity,we ignore the moid of the trajectory in this section. Note also that both the generalized(cf. Definition 6) and the summarized (cf. Definition 8) trajectory representations canbe used in the compression phase.

Definition 9 Degree of compression: The degree of compression DC is defined as:

DC ¼ 1−sizeof Tr appð Þsizeof Tr orð Þ , where sizeof indicates the number of bytes of a trajectory.

Let gu be a generalized trajectory unit between two time-stamped locations, i.e.,(loc1, t1) and (loc2, t2), corresponding to the unit endpoints. We denote by gu. loc1 (orsimply loc1 when there is no confusion concerning gu) and gu.loc2 (or simply loc2)the two network locations (cf. Definition 3) corresponding to the unit endpoints attime t1 and t2 respectively. Also, we denote by gu.loc (or simply loc) any otherlocation belonging to gu at time t, i.e., (loc, t) with t∈(t1, t2),. Finally, we denote bylen(loca, locb) the geometrical length of the polyline that also indicates the networkdistance between loca and locb (see Fig. 8).

Fig. 8 Example of network dis-tance between two network loca-tions. The distance is synchronouswhen the two locations are con-sidered at the same time instant

Geoinformatica

Given gu, the location of the MO at each time instant t∈(t1,t2) is obtained by linearinterpolation. Therefore, for any t∈(t1,t2) the corresponding location gu.loc is situated at anetwork distance from the gu.loc1 as computed by Formula (1):

len loc; loc1ð Þ ¼ len loc1; loc2ð Þt2−t1

t−t1ð Þ ð1Þ

Since each intermediate point on the polyline (loc1,loc2) is located at a certain distance fromthe endpoint loc1, one can determine the exact MO location at a given time instant based onFormula (1).

Note that for a given (original) trajectory Tror any approximate trajectory Trapp follows the exactsame route in the network. Only the temporal positioning along the followed route is approximatedin the case of constrainedmovement (see Fig. 8). This is different from the 2Dmodel where both thespatial and the temporal components of the trajectory are approximated (see Fig. 3).

Definition 10 Synchronous network distance: Given Tror a trajectory and Trapp an approxi-mation of Tror, we define the synchronous network distance between Tror and Trapp as sd(t)=len(locor,locapp), where locor and locapp denote the network locations at time t on Tror andTrapp respectively.

Lemma 1: Given a road network G=(V,E), a mapping function φ and an in-networktrajectory Tror=⟨gu1,…,gun⟩, a compression method based on the Douglas-Peucker linesimplification algorithm [8] and the synchronous network distance form a compressionalgorithm with deterministic error bounds.

Proof. Given Tror=⟨gu1,…,gun⟩, the compression algorithm will take as input the series of allthe time-stamped positions in Tror, i.e., ⟨(loc1

1,t11),(loc2

1,t21),(loc2

2,t22),…,(loc2

n−1,t2n−1) (loc2

n,t2n)⟩. The

sequence contains the start location of the first trajectory unit and the end locations of all thetrajectory units. The class of compression algorithms that we consider in this paper (i.e., batchalgorithms with deterministic error bounds, cf. Section 2.1.1 and 2.1.2) will produce as output asubseries of the input series by discarding some data points such that the network distance fromthe original data points to the approximate trajectory is below a user-defined threshold th. Forexample, assuming that the DP algorithm is selected to perform the compression, the onlydifference compared with the original algorithm presented in Section 2.1.2 will be to employthe synchronous network distance instead of the geometrical distance between Tror and Trapp.

We prove that the synchronous distance between any data point in the continuous Tror=⟨gu1,…,gun⟩, where each gui uses linear interpolation between the unit endpoints, and thecontinuous Trapp=⟨gu1,…,gum⟩ is less than th.

Given guior∈Tror there is a guj

app∈Trapp such as guior⊆gujapp. For any t∈(t1i ,t2i ) the corre-

sponding location guior.locor on Tror is situated at the network distance from gui

or.loc1or given by:

len locor; locor1� � ¼ len locor1 ; loc

or2

� �

ti2−ti1t−ti1� � ð2Þ

(cf. Formula (1)). Similarly, ∀t∈(t1i ,t2i ) the corresponding location gujapp.locapp on Trapp is

situated at the network distance from gujapp.loc1

app given by:

len locapp; locapp1

� � ¼ len locapp1 ; locapp2

� �

t j2−tj1

t−t j1� � ð3Þ

Geoinformatica

We can rewrite (2) by considering gujapp.loc1

app as a reference point instead of guior.loc1

or:

len locor; locapp1

� � ¼ len locor1 ; locor2

� �

ti2−ti1t−t1i� �þ len locor1 ; loc

app1

� � ð4Þ

The synchronous network distance between guior.locor and guj

app.locapp, ∀t∈(t1i ,t2i ), is:

sd tð Þ ¼ len locapp; locapp1

� �−

�� len locor; locapp1

� ��� ð5Þ

From (3) and (4) sd(t) can be rewritten to sd(t)=|at+b|, where a,b are two constants. Hence,the synchronous network distance between any two points of Tror and Trapp is the absolutevalue of a linear function. At the same time we have

sd t i1� �

< th ð6Þand

sd t i2� �

< th ð7Þfrom the condition imposed on the compression process. As sd(t) is linear, from Formulas (6)and (7) it results that sd(t)<th ∀t∈(t1i ,t2i ).

The compression method proposed in this section is based on the modified Douglas-Peucker line simplification algorithm. However, other compression methods such as the onespresented in Section 2.1 can be employed instead. For instance, the optimal line simplificationalgorithm [17], which finds the simplified polyline having the minimum number of points for agiven error bound, can be considered as an alternative to the Douglas-Peucker algorithm.Besides, the methods for sampling trajectory streams (see Section 2.1.3) can also be adapted toconstrained trajectories, if an error bound of the approximation is not required by applications.This is possible since the proposed data model and network partitioning algorithm areorthogonal to the employed compression method.

Lemma 1 states that the network distance between any two points on the original trajectory andthe approximate trajectory at the same time instant, i.e., synchronous network distance, is less thatthe user-defined threshold th. Hence, th represents an upper error bound of the approximation.However, the actual synchronous distance between Tror and Trapp at any given time instant could bemuch lower. In order to have an aggregatemeasure of the actual error introduced by the compressionprocess, we consider, similar with [24], the average synchronous error between Tror and Trapp.

Definition 11 Average synchronous error: Given Tror a trajectory and Trapp an approximationof Tror, we define the average synchronous error ASE between Tror and Trapp as:

ASE Tr or; Tr appð Þ ¼∑i¼1

n

ase t i1;ti2ð Þ

t n2−t 1

1ð Þ , where ase t1i; ti2� � ¼ ∫ t

i2

ti1sd tð Þ⋅dt and sd(t) is computed with

Formula (5).For each generalized unit in Tror, we compute the integral of the instantaneous distance, i.e.,

sd(t), between Tror and Trapp, which gives a local average distance multiplied by the length ofthe unit time interval, i.e., [t1

i ,t2i ]. Then, the sum of all weighted local average distances is

divided to the trajectory time interval, i.e., [t11,t2

n], to obtain the average error.

Geoinformatica

6 Experimental evaluation

We have experimentally evaluated the proposed compression method based on the modifiedDouglas-Peucker algorithm (cf. Section 5), the extended data model and the networkpartitioning algorithm. As we mentioned earlier, there is currently no method for in-networktrajectory compression with deterministic error bounds. To assess the appropriateness of ourapproach for constrained MOs, we used as reference the first method proposed in [18] (seeSection 2.2). That is, we employ the 2D compression algorithm proposed in [24], which alsouses an adapted Douglas-Peucker algorithm (cf. Section 2.1.2), and then consider the obtainedcompressed trajectory in the network space. Also, to underline the importance of the extendeddata model for the compression rate, we tested the in-network compression method both withthe basic data model and with the proposed data model (cf. Section 3.2). We measured thecompression rate and the average synchronous error for a wide range of user-defined thresholdvalues. The details of the employed road networks, trajectory datasets and testing parametersare given below.

This section is organized as follows. Section 6.1 introduces the road networks and thedatasets used for testing. In Section 6.2, we measure the compression rates and the corre-sponding ASE values of the compression algorithm when using the basic and the extended datamodel in an edge-oriented network. We evaluate then the proposed network partitioningalgorithm in Section 6.3. We compare the effectiveness of the proposed approach with adirection-based network partitioning that creates long straight routes (see Section 2.3) and withthe ideal partitioning. Finally, in Section 6.4 we evaluate the time efficiency of the compressionmethod and the partitioning algorithm.

6.1 Experimental setting

All the experiments were conducted on a Pentium 4, 3.2 GHz machine with 4 GB of RAM(note: the tests do not need so much memory) running Windows XP. We used Java 5.0 toimplement the two compression methods. The average synchronous error was computedaccording to Definition 11 for both the in-network compression and for the 2D compression.The DC is conform to Definition 9. For a trajectory unit we considered that moid, rid and posare represented by four bytes each, and that t is represented on eight bytes. We also used Oracle11 g EE to store the road networks and the trajectory datasets.

The compression algorithms were tested using both real and synthetic data (see Table 1).The real dataset contains 80 trajectories of approximately 4.7 km each in the surroundings ofthe city of Versailles (France). In the edge-oriented network model, Versailles has 14414 edges.Nevertheless, the real trajectory dataset covers only a small region of the network (see Fig. 9(left)). The dataset comes from a complementary study in the LAVIA project (a Frenchacronym for “Speed limiter that adapts to the speed limit”) [9]. The dataset contains 8 groupsof 10 trajectories each of several drivers having specific driving manners (e.g., normal,

Table 1 Test datasets

Dataset name # of trajectories Avg. # of units per trajectory Avg. trajectory length Avg. unit length

Versailles 80 287 4.7 km 16 m

Versailles (filtered) 80 66 4.7 km 71 m

Oldenburg 49011 53 2886 % 54 %

Stockton 33192 89 7758 % 87 %

Geoinformatica

nervous, economical or with the LAVIA system activated). Each trajectory is a portion of thepredefined circuit represented in Fig. 9. The trajectories in a group follow the exact same paths.The paths are disjoint between groups. The location was sampled by using a GPS mounted inthe car at the highest sampling rate, i.e., two seconds. Also, to simulate a controlled samplingrate, we filtered the original trajectory points such that the MO location is sampled every timethe MO changes the road and at least every 100 m. We used both the original and the filtereddataset in the tests.

Since the available real trajectory datasets are not representative enough in terms oftrajectory distribution over a road network, we employed the well-known Brinkhoffgenerator [1] to create in-network trajectory datasets. The synthetic datasets were usedto test the impact of the proposed network partitioning algorithm on the compressionrate. We used two of the road networks that are available with the generator, i.e., thecity of Oldenburg (Germany) and the city of Stockton (San Joaquin County, CA) (seeFig. 9 (center and right)). In the edge-oriented network model, Oldenburg has 3803edges, whereas Stockton has 20291 edges. The generator randomly generates a sourceand a destination for each trajectory based on the network node density. As in theVersailles filtered dataset, in the simulated datasets an MO reports its position eachtime it traverses a (geometric) segment or at least after each 10 time units.

For each trajectory dataset in Table 1, we vary the input threshold value of the compressionalgorithms and measure the obtained degree of compression (DC) and the average synchro-nous error (ASE). The threshold values depend on the scale of the underlying road network.For each road network, we computed first the average edge length. Then, we chose thethreshold values as a percentage, varying from 1 % to 128 % in a geometrical sequence ofcommon ration equal to 2, of the average edge length. The ASE values are also measured as apercentage of the average edge length. For the road network of Versailles we had the latitude/longitude representation, which permitted us to compute the average edge length in meters(app. 100 m) and to express the threshold and ASE values in meters as well. Different fromVersailles, the trajectory length and the thresholds for Oldenburg and Stockton are expressed asa percentage of the average edge length, because we do not know the reference system used inthese networks.

In the experiments, we use and compare different data and network models torepresent the trajectory data. In particular, we evaluate the proposed data model (withan edge-oriented network) in Section 6.2 and the partitioning algorithm in Section 6.3(i.e., edge versus route networks). Table 2 presents the labels we use in the graphicalrepresentation of the experimental results. Note that in some cases we use differentlabels for the same combination between a data model and a network model depend-ing on the setting.

Fig. 9 Versailles network and trajectories (left), Oldenburg (center) and Stockton (right) networks

Geoinformatica

6.2 DC versus ASE

The graphics in Figs. 10 and 11 present the results obtained for the DC and ASE for the testeddatasets. The curves indicate the DC and the ASE for the 2D compression algorithm (labeled “2D”and used as reference) and the in-network compression algorithm for different scenarios. Wemeasured the DC obtained with the basic data model (labeled “Basic DM”) and the two proposedextended data models. The “Optimal” curves indicate the DC obtained in the ideal situation whereeach trajectory is contained by a single road. In this case, similar to the 2D compression, the DCdepends only on the number of data units in the original and the compressed representation. Notethat in the case of Versailles datasets, the optimal compression rates can be obtained by creating longroutes with the proposed network partitioning algorithm. Indeed, the trajectories in this dataset areeither spatially equal or spatially disjoint and the partitioning algorithm will create a road for eachgroup of trajectories. However, in large datasets, trajectories overlap only partially and it isimpossible to attain the optimal compression rates for all the trajectories in the general case.

A first observation regarding the graphics in Figs. 10 and 11 is that the in-network compressionalgorithm has good potential (“Optimal” vs. “2D”) to achieve better DC at (much) lower ASE than

Table 2 Labels used in the experiments

Network model Data model

Basic Extended Extended summarized

Edge-oriented Basic DM / BasicDM edge

Ext. DM / Ext.DM edge

Ext. SDM / Ext. SDM edge

Route-oriented Basic DM route Ext. DM route / Ext.DM greedy / Ext.DM direction

Ext. SDM route / Ext. SDMgreedy / Ext. SDM direction

Fig. 10 Degree of compression (data models)

Geoinformatica

the 2D compression algorithm. This is expected since the compression considers the network spaceinstead of the 2D space. The second observation is that the basic data model ruins all this potentialespecially if the trajectories are sampled at a frequency lower than themaximum sampling frequency(e.g., Versailles filtered, Oldenburg and Stockton datasets). This is due to the discontinuity points inthe network space, i.e., a new trajectory unit is created each time the MO changes the road (seeSection 3). On the other hand, the proposed extended data model manages to solve this problem andto offerDC andASE values comparablewith the 2D compression evenwhen using an edge-orientednetwork model. Note also that for the in-network compression algorithm the ASE values areindependent of the employed data model. The ASE only depends on the compression thresholdvalue, since the ASE value is determined by the data points eliminated through compression and notby the manner in which the resulted compressed trajectory is represented.

The obtained compression rates are (very) high (when using the extended data model) forrelatively small threshold values for both the real and the synthetic datasets, which demonstrates theutility of the compression algorithm.Also, the higher the sampling rate is the higher the obtainedDC(see Versailles vs. Versailles filtered). For example, an error bound of only 4 m is sufficient to obtaina DC of 0.8 (i.e., a factor of five between original and compressed data size) for the Versaillesdataset, whereas a similar DC is obtained with a threshold of 32 m for Versailles filtered dataset.

The synthetic datasets have similar behavior with the Versailles filtered dataset. A majordifference is that for the real dataset the DC grows linearly with the threshold value. For thesynthetic datasets, high DC values are obtained even for small thresholds. Although the MOgenerator [1] varies the speed of the generated MO, the variations are less frequent than in the realworld where vehicles change speed often. Also, the speed variations are more frequent within thegenerated MOs when the density of MOs is high (i.e., the generator simulates congestion). Thisexplains the difference between Oldenburg and Stockton datasets, since Stockton road network islarger than Oldenburg, whereas the number of generated MO is lower for Stockton (see Table 3).

As expected, the obtainedASE values aremuch smaller than the thresholds values. This indicatesthat on average the introduced error is way below the maximum accepted error given as a user-

Fig. 11 ASE vs. degree of compression

Geoinformatica

defined threshold to the compressionmethod. For example, for a threshold value of 4mwe obtainedan ASE of less than 1 m (and a DC of 0.8) for the Versailles dataset. Similarly, a threshold value of32 m led to an ASE of approximately 7 m (and a DC of 0.8) for the Versailles filtered dataset.

6.3 Network partitioning into routes

As discussed in Section 4, we can further improve theDC for the in-network compressionmethodby reducing the number of roads in the network, i.e., by employing a route-oriented networkmodel. In this section, we evaluate the effectiveness of the proposed network partitioningalgorithm.We use two synthetic datasets, i.e., Oldenburg and Stockton, containing a large numberof trajectories. The trajectories in the two test datasets cover entirely the road networks.

Figure 12 indicates theDC for different threshold values for the synthetic datasets. The routes inthis test were computed with the proposed “greedy” algorithm (see Table 3). The networkpartitioning algorithm manages to significantly improve the compression rate for the in-networkcompression using the basic datamodel and the route-oriented networkmodel. The gain in theDC isabout 95 % for the Oldenburg dataset and of 280 % for the Stockton dataset (“Basic DM edge” vs.“Basic DM route” in Fig. 12). The larger the network is, the more effective and useful is thepartitioning algorithm. Nevertheless, the compression rates are lower than those obtained by usingthe extended data model. The gap between the DC obtained with the extended summarized datamodel andwith the basicmodel over a route-oriented network is approximately 13% for Oldenburgand 11% for Stockton (“Basic DM route” vs. “Ext. SDM route” in Fig. 12). Moreover, the networkpartitioning improves the DC for the extended summarized data model with 6.3 % for Oldenburgand 10 % for Stockton (“Ext. SDM edge” vs. “Ext. SDM route” in Fig. 12). Note also that the DCvalues obtainedwith the extended summarized datamodel and the route-oriented networkmodel arevery close to the optimal compression rates (i.e., the difference is of approximately 2 % forOldenburg and of 1.8 % for Stockton - “Ext. SDM route” vs. “Optimal” in Fig. 12).

Table 3 Network partitioning statistics

Network (method) # of routes Avg. route length Min. route length Max. route length Std. dev.

Oldenburg (greedy) 1289 2.95 1 101 5.48

Oldenburg (direction) 1435 2.65 1 103 4.92

Stockton (greedy) 5382 3.77 1 172 8.11

Stockton (direction) 6039 3.36 1 124 6.77

Fig. 12 Degree of compression (route vs. edge)

Geoinformatica

We also evaluate the effectiveness of the proposed partitioning algorithm against a direction-based partitioning similar to the direction-based approach proposed in [6], i.e., to create long roadsby concatenating the network segments in a straight line. We start with the longest segment in thenetwork that we extend at both ends with the segment having the smallest angle (with the currentsegment) among the candidate segments. The extension of the current road endswhen theminimumangle between the candidate segments and the end segment of the current road is greater than 90° orthere are no more candidate segments. The process continues with the next longest segment, whichwas not included in the previous constructed roads, and until all the segments are considered.

Table 3 lists the statistics of the routes computed with the greedy and the direction-basedalgorithms. The statistics are expressed in number of edges. The two partitioning methods produceon average routes with similar lengths. However, the proposed method is more efficient in reducingthe average number of discontinuities per trajectory (cf. Table 4). The average number of roads pertrajectory for the two synthetic datasets before and after the partitioning is indicated in Table 4. In theabsence of partitioning a trajectory traverses in average about 31 roads (edges) in the Oldenburgdataset and 69 roads (edges) in the Stockton dataset. These numbers are reduced significantlythrough network partitioning. Note that the proposed greedy approach reduces the number ofdiscontinuities by 21.6 % for Oldenburg and by 34.3 % for Stockton compared to the direction-based method. This is due to the fact that the partitioning is guided by the trajectory distribution inour method, whereas in the second case the partitioning is based on the network topology andgeometry and does not consider the data. Once more we observe that the larger the network (or thelonger the trajectory) is, the more effective is the greedy partitioning algorithm.

Figure 13 presents the obtained DC for the in-network compression method and the extended(summarized) data model for the Oldenburg and Stockton datasets. The proposed greedyapproach offers better compression rates than the direction-based approach, as it is more efficientin reducing the number of discontinuities. The proposed partitioning method adds an additionalmargin of 4.8 % for Oldenburg and 6.2 % for Stockton with the basic data model (not shown inFig. 13), of 1.9 % for Oldenburg and 2.4 % for Stockton with the extended data model (“Ext. DMgreedy” vs. “Ext. DM direction” in Fig. 13), and of 0.6 % for Oldenburg and 0.8 % for Stocktonwith the extended summarized data model (“Ext. SDM greedy” vs. “Ext. SDM direction” inFig. 13). Note that the impact of reducing the number of discontinuities on the compression ratediminishes as we pass from the basic data model to the extended data model and to the extendedsummarized data model (i.e., the gain in DC is partially due to the data model).

In conclusion, the results presented in this section show that both the extended data model(Fig. 10) and the route-oriented network model (Fig. 12) represent alternative solutions for attaininggood compression rates for the in-network compression algorithm. Moreover, the combination ofthe two solutions (Fig. 13) can lead to compression rates close to the optimal values.

6.4 Time efficiency experiments

We evaluate in this section the time efficiency of the proposed compressionmethod and partitioningalgorithm. The proposed compression method is based on the well-known DP algorithm [8] (seeSection 2.1.2) that we adapted for in-network trajectory compression. We employed the

Table 4 Discontinuity points pertrajectory statistics Avg. # of roads per trajectory Oldenburg Stockton

No partitioning 31.26 68.8

Greedy-based partitioning 9.41 12.73

Direction-based partitioning 11.45 17.1

Geoinformatica

straightforward implementation of the DP algorithmwith a worst-case running time ofO(N2), as thecompression times were more than satisfactory. However, as indicated in Section 2.1.2, the timecomplexity can be further improved to O(NlogN) [16] and O(NlogkN) (where k∈{2,3}) [13].

Figure 14 (left) indicates the average time of compressing a trajectory for all four trajectorydatasets used in the experiments. We measured the total compression time of a dataset and thenaveraged the results over the number of trajectories in the dataset. Also, the presented resultsrepresent the average values over 10 runs. The compression time depends on the length of thetrajectory, i.e., the number of data points in the original trajectory. Besides, the compressiontime decreases significantly as the input threshold value increases, which is expected sincelarger threshold values require less recursions in the DP algorithm. Versailles dataset containsthe longest trajectories and therefore, the higher compression times. Figure 14 (right) details thecompression times for the rest of the tested datasets, which contain shorter trajectories. For thesedatasets, the compression times are similar. A general remark is that the compression algorithmis time efficient offering compression times of a fraction of a millisecond on our test machine.

We also measured the time to partition the road network using the proposed greedy-basedmethod and the direction-based method used as reference. The results are displayed in Fig. 15.The proposed partitioning method has a worst-case running time complexity that is loglinearwith the network size and linear with the trajectory dataset size (see Section 4). The direction-based partitioning method uses only the topological and geometrical information of the roadnetwork to perform the partitioning, i.e., has a worst-case running time complexity that isloglinear with the network size. On the other hand, the proposed greedy method is moreefficient in reducing the number of discontinuities than the direction-based method (seeTable 4) and therefore, it has more potential to increase the compression rate.

The difference in the execution times is important for the smaller network, i.e., Oldenburg.Surprisingly, the partitioning times are much closer for the larger network, i.e., Stockton. This

Fig. 13 Evaluation of the network partitioning methods

Fig. 14 Average time to compress a trajectory

Geoinformatica

indicates a better scalability with the network size of the greedy method. The reason is that thedirection-based method requires more costly angle computations. Note also that the twodatasets have similar sizes, i.e., around 2.6M units for Oldenburg and 2.9M units forStockton. A simple solution to improve the time efficiency of the greedy method would beto sample the trajectory dataset before partitioning. But we leave this study for future work.

7 Conclusion and future work

The continuous increase of the trajectory data volumes raises computational, transmission andstorage challenges. In this paper, we tackled the problem of compressing in-network trajectorydata with deterministic error bounds. We show that the existing 2D compression methods arenot adapted for constrained trajectory data. We also analyze the limitations of the classical in-network trajectory data model for in-network compression. Then, we propose an extended datamodel and a network partitioning algorithm adapted to in-network trajectory compression.Based on the well-known Douglas-Peucker line simplification algorithm and on the abovementioned proposals, we introduce a method for in-network trajectory compression withdeterministic error bounds. To this end, we adapted the synchronous distance to the networkspace. The experimental evaluation of the proposed compression method clearly shows theappropriateness of our approach. For instance, we obtain 80 % of compression rate with anerror bound of 4 m only.

Most of the existing works on spatio-temporal trajectory compression focus on obtainingthe best tradeoff between the compression rate, the introduced error and the computationalcost. Nevertheless, there is another important aspect strongly related to trajectory compression,i.e., to assess the effect that compression has on querying the compressed data, whichseems to have got little consideration. We see two interesting directions to explore inthis context. First, we intend to estimate the effect that the compression error has onthe query result quality in terms of false positive/negative rates [11]. Second, we planto evaluate the query performance gain obtained through compression and to searchsolutions that maximize this gain.

The compression method proposed in this paper can also be used in the case of trajectorydata streams by replacing the Douglas-Peucker algorithm with an open window algorithm [21,24]. One will then obtain a compression method with deterministic error bounds for in-network trajectory streams. Nevertheless, the processing complexity of such an algorithmcould be too high in certain cases [21] (e.g., if the compression takes place directly on a mobiledevice). As future work, we plan also to study the compression techniques for in-networktrajectory data streams under such constraints.

Fig. 15 Network partitioning time

Geoinformatica

Acknowledgments This work was partially supported by the KISS ANR-11-INSE-005 grant. The authorswould also like to thank the reviewers for their valuable suggestions that improved significantly this journal article.

References

1. Brinkhoff T (2002) A framework for generating network-based moving objects. GeoInformatica 6(2):153–180

2. Cao H, Wolfson O (2005) Nonmaterialized motion information in transport networks. Proc. ICDT: 173–1883. Cao H, Wolfson O, Trajcevski G (2003) Spatio-temporal data reduction with deterministic error bounds.

DIALM-POMC 2003: 33–424. Cao H, Wolfson O, Trajcevski G (2006) Spatio-temporal data reduction with deterministic error bounds.

VLDB J 15(3):211–2285. Chazal F, Chen D, Guibas LJ, Jiang X, Sommer C (2011) Data-driven trajectory smoothing. In: Proceedings

of the 19th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS2011), pp. 251–260. Chicago, Illinois, USA

6. Civilis A, Jensen CS, Pakalnis S (2005) Techniques for efficient road-network-based tracking of movingobjects. IEEE Trans Knowl Data Eng 17(5):698–712

7. de Almeida VT, Guting R (2005) Indexing the trajectories of moving objects in networks. GeoInformatica9(1):30–60

8. Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent adigitized line or its caricature. The Canadian Cartographer 10:112–122

9. Ehrlich J, Marchi M, Jarri P, Salesse L, Guichon D, Dominois D, Leverger C. LAVIA, the French ISAproject: main issues and first results on technical tests. 10th World Congress & Exhibition on ITS – 16–20November 2003, Madrid, Spain

10. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in largespatial databases with noise. In Proc. 2nd Int’l Conf. on Knowledge Discovery and Data Mining, Portland,Oregon, pp 226–231

11. Frentzos E, Gratsias K, Theodoridis Y (2009) On the effect of location uncertainty in spatial querying. IEEETrans Knowl Data Eng 21(3):366–383

12. Garey MR, Johnson DS (1990) Computers and intractability; a guide to the theory of NP-completeness, New York13. Gudmundsson J, Katajainen J, Merrick D, Ong C, Wolle T (2007) Compressing Spatio-temporal

Trajectories. In: Proceedings of the 18th International Symposium on Algorithms and Computation(ISAAC), pp. 763–775. Sendai, Japan

14. Güting RH, de Almeida VT, Ding Z (2006) Modeling and querying moving objects in networks. VLDB J15(2):165–190

15. Han B, Liu L, Omiecinski E (2012) NEAT: road network aware trajectory clustering. ICDCS 2012: 142–15116. Hershberger J, Snoeyink J (1994) An O(n log n) implementation of the Douglas-Peucker algorithm for line

simplification. In: Proceedings of the 10th Symposium on Computational Geometry, pp 383–384. StonyBrook, NY

17. Imai H, Iri M (1988) Computational morphology, chapter polygonal approximations of a curve – formula-tions and algorithms, pp 71–86. North-Holland Publishing Company, Netherlands

18. Kellaris G, Pelekis N, Theodoridis Y (2009) Trajectory compression under network constraints. Proc. SSTD:392–398

19. Kharrat A, Sandu Popa I, Zeitouni K, Faiz S (2008) Clustering algorithm for network constraint trajectories.SDH 2008: 631–647

20. Koegel M, Baselt D, Mauve M, Scheuermann B (2011) A comparison of vehicular trajectory encodingtechniques. In: Proceedings of the 10th Annual Mediterranean Ad Hoc Networking Workshop (MedHocNet‘11), pp. 87–94. Favignana Island, Sicily, Italy

21. Lange R, Dürr F, Rothermel K (2011) Efficient real-time trajectory tracking. VLDB J 20(5):671–69422. Lee J-G, Han J, Li X, Gonzalez H (2008) TraClass: trajectory classification using hierarchical region-based

and trajectory-based clustering. PVLDB 1(1):1081–109423. Lee J-G, Han J, Whang K-Y (2007) Trajectory clustering: a partition-and-group framework. SIGMOD

Conference 2007: 593–60424. Meratnia N, de By RA (2004) Spatiotemporal compression techniques for moving point objects. Proc.

EDBT: 765–78225. Moustris G, Tzafestas SG (2008) Reducing a class of polygonal path tracking to straight line tracking via

nonlinear strip-wise affine transformation. Math Comput Simul 79(2):133–148

Geoinformatica

26. Potamias M, Patroumpas K, Sellis TK (2006) Sampling trajectory streams with spatiotemporal criteria. Proc.SSDBM: 275–284

27. Richter K-F, Schmid F, Laube P (2012) Semantic trajectory compression: representing urban movement in anutshell. J Spat Inf Sci 4(1):3–30

28. Roh G-P, Hwang S-W (2010) NNCluster: an efficient clustering algorithm for road network trajectories.DASFAA (2) 2010: 47–61

29. Sandu Popa I, Zeitouni K, Oria V, Barth D, Vial S (2011) Indexing in-network trajectory flows. VLDB J20(5):643–669

30. Speicys L, Jensen CS (2008) Enabling location-based services - multi-graph representation of transportationnetworks. GeoInformatica 12(2):219–253

31. Trajcevski G, Cao H, Scheuermann P, Wolfson O, Vaccaro D (2006) On-line data reduction and the qualityof history in moving objects databases. MobiDE: 19–26

Iulian Sandu Popa is an assistant professor in Computer Science at the University of Versailles Saint-Quentin(UVSQ) and member of INRIA Secured and Mobile Information Systems (SMIS) team since 2012. He receivedhis Ph.D. in Computer Science from the UVSQ in 2009. He is also a Computer Science engineer from theUniversity “Politehnica” of Bucharest which he graduated in 2005. His main research interests are embeddeddatabase management systems, spatiotemporal databases, and mobile data management and systems.

Karine Zeitouni received her Ph.D. in Computer Science from the University of Paris 6 in 1991. She is aprofessor in Computer Science at University of Versailles Saint-Quentin. She is heading the Data Integration andMining (DIM) group at PRiSM laboratory. Her main research interest lies in spatiotemporal databases andknowledge extraction, with a focus on applications in the fields of Transport, Environment and Health.

Geoinformatica

Vincent Oria is an associate professor of computer science at the New Jersey Institute of Technology in Newark,NJ, USA. His main research interest is multimedia databases but he has also interests in spatial databases, andrecommender systems.

Ahmed Kharrat Ahmed Kharrat received his Ph.D. in Computer Science from the University of VersaillesSaint-Quentin. He received his B.S. in Computer Science from University of Sfax and his M.S. from Universityof Rouen. His research interests focus on spatiotemporal databases and knowledge extraction.

Geoinformatica