27
Knowl Inf Syst DOI 10.1007/s10115-013-0639-5 REGULAR PAPER Computing and visualizing popular places Marta Fort · J. Antoni Sellarès · Nacho Valladares Received: 21 July 2011 / Revised: 18 March 2013 / Accepted: 5 April 2013 © Springer-Verlag London 2013 Abstract Data analysis and knowledge discovery in trajectory databases is an emerging field with a growing number of applications such as managing traffic, planning tourism infrastructures, analyzing professional sport matches or better understanding wildlife. A well-known collection of patterns which can occur for a subset of trajectories of moving objects exists. In this paper, we study the popular places pattern, that is, locations that are visited by many moving objects. We consider two criteria, strong and weak, to establish either the exact number of times that an object has visited a place during its complete trajectory or whether it has visited the place, or not. To solve the problem of reporting popular places, we introduce the popularity map. The popularity of a point is a measure of how many times the moving objects of a set have visited that point. The popularity map is the subdivision, into regions, of a plane where all the points have the same popularity. We propose different algorithms to efficiently compute and visualize popular places, the so-called popular regions and their schematization, by taking advantage of the parallel computing capabilities of the graphics processing units. Finally, we provide and discuss the experimental results obtained with the implementation of our algorithms. Keywords Movement patterns · Popular places · Computational geometry · Graphics processing units 1 Introduction Mobile devices, for instance, GPS and mobile phones, are able to capture the trajectories of moving objects, which are called entities, such as vehicles, people or animals. The trajectory of an entity is described by a sequence of points, where each point represents the position of the entity in a specific instant of time. Assuming that, an entity moves between two consecutive positions without changing direction, its trajectory path is modelled by the 2D M. Fort · J. A. Sellarès · N. Valladares (B ) Informàtica i Matemàtica Aplicada, Universitat de Girona, Girona , Spain e-mail: [email protected] 123

Computing and visualizing popular places

  • Upload
    nacho

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computing and visualizing popular places

Knowl Inf SystDOI 10.1007/s10115-013-0639-5

REGULAR PAPER

Computing and visualizing popular places

Marta Fort · J. Antoni Sellarès · Nacho Valladares

Received: 21 July 2011 / Revised: 18 March 2013 / Accepted: 5 April 2013© Springer-Verlag London 2013

Abstract Data analysis and knowledge discovery in trajectory databases is an emergingfield with a growing number of applications such as managing traffic, planning tourisminfrastructures, analyzing professional sport matches or better understanding wildlife. Awell-known collection of patterns which can occur for a subset of trajectories of movingobjects exists. In this paper, we study the popular places pattern, that is, locations that arevisited by many moving objects. We consider two criteria, strong and weak, to establish eitherthe exact number of times that an object has visited a place during its complete trajectoryor whether it has visited the place, or not. To solve the problem of reporting popular places,we introduce the popularity map. The popularity of a point is a measure of how many timesthe moving objects of a set have visited that point. The popularity map is the subdivision,into regions, of a plane where all the points have the same popularity. We propose differentalgorithms to efficiently compute and visualize popular places, the so-called popular regionsand their schematization, by taking advantage of the parallel computing capabilities of thegraphics processing units. Finally, we provide and discuss the experimental results obtainedwith the implementation of our algorithms.

Keywords Movement patterns · Popular places · Computational geometry · Graphicsprocessing units

1 Introduction

Mobile devices, for instance, GPS and mobile phones, are able to capture the trajectories ofmoving objects, which are called entities, such as vehicles, people or animals. The trajectoryof an entity is described by a sequence of points, where each point represents the positionof the entity in a specific instant of time. Assuming that, an entity moves between twoconsecutive positions without changing direction, its trajectory path is modelled by the 2D

M. Fort · J. A. Sellarès · N. Valladares (B)Informàtica i Matemàtica Aplicada, Universitat de Girona, Girona , Spaine-mail: [email protected]

123

Page 2: Computing and visualizing popular places

M. Fort et al.

polygonal line whose vertices are the trajectory points. Trajectory databases, in many casesrather large in volume, contain valuable and implicit knowledge that can be extracted usinggeometry analysis, data mining techniques and visual analytic methods [13,23,29,35]. Dataanalysis and knowledge discovery in trajectory databases is an emerging field, and one whichis essential to substantiate decision making. There is a growing number of applications, suchas traffic management, infrastructure planning in tourism, analysis of professional sportmatches or in gaining a better understanding of wildlife, where this could be applied.

Despite being a relatively young research field, patterns in trajectory databases have beenwidely studied [5,26]. In most cases, trajectory databases refer to many entities that makeindividual trajectories meaningful, so there has been specific interest in finding patterns thatare exhibited by many entities, i.e., group movement patterns. In a transportation context,movement patterns can be used to detect traffic jams. The movement patterns of tourists withina destination can serve to manage tourism infrastructures. In a football scenario, coaches cananalyze player movement during matches to learn about the behavior and strategies of oppos-ing teams. In the case of animals, movement patterns can be viewed as the spatiotemporalexpression of behaviors as, for example, in flocking sheep.

Group movement patterns have a geometric modelling which can be directly used in thedesign of the algorithms being proposed to detect such patterns. Initial works describe severalexact and approximate algorithms, based on computational geometry techniques, that detectgroup movement patterns like flock, convergence and leadership [15–19]. Algorithms forclustering similar trajectories of moving objects are presented in [22,28,29,34]. The study ofmovement patterns has increasingly focused on developing more accurate definitions of thedifferent patterns and more efficient algorithms to detect them. For example, Benkert et al.[4] described a new algorithm for detecting flock patterns, and Andersson et al. [1] presenteda definition and an algorithm for leadership patterns. Dodge et al. [8] suggested a possiblepattern classification according to the information they record.

We are interested in the problem of detecting popular places among trajectory paths, thatis, locations that are visited by many entities. The popular place problem only reveals, withouttaking into account the temporal dimension of the movement, how many times a place hasbeen visited by the entities. The detection of popular places has multiple applications inreal life: in traffic analysis, for example, when we are interested in determining noise orpollution levels in highly transited areas; in tourism management, to determine locations, ina historical town, which are most frequently visited by tourists; in marketing, to ensure theeffectiveness of an advertisement in a mall by determining how many people have seen it.Depending on the application, it is useful to know the exact number of times that an entityhas visited the place (strong criterion) or simply to know whether it has visited the place ornot (weak criterion). In the traffic analysis example, the total number of times that a car hasbeen in a place needs to be counted when looking for the strong popular places. On the otherhand, in the tourism management example, it does not matter whether a tourist has been inthe place once or more than once, we are simply interested in how many different touristshave been there, in this case we are dealing with a weak popular place. In the marketingexample, both criteria could be applied. Thus, we can consider that the more times you seean advertisement, the more effective it becomes, i.e., considering the strong criterion, or wecould take the weak criterion into consideration if we are simply interested in maximizingthe number of different potential clients.

Figure 1, which is commented on further in Sect. 5.1, presents a more detailed example.Figure 1a shows the trajectories of 10,000 ants looking for food in a squared region, accordingto a NetLogo [36] simulation. In Fig. 1b, c, the colored points are popular points accordingto the weak and strong criteria, respectively. They are colored from yellow to red, according

123

Page 3: Computing and visualizing popular places

Computing and visualizing popular places

Fig. 1 a The trajectory paths followed by the ants. b and c the popular regions under the weak and strongcriteria colored according to popularity. White regions are not popular, yellow regions have little popularity,and the red regions are the most popular ones (color figure online)

to their popularity. The white areas are the not popular ones, and the red ones are the mostpopular or most visited ones. The red areas in Fig. 1b have been visited by the biggest numberof different ants during the simulation. Meanwhile, the areas in Fig. 1c have received thegreatest number of visits. Notice that the yellow area in Fig. 1c is bigger than that in Fig. 1bmeaning that the same ant has gone through the same points more than once, thus increasingthe number of visits from the weak to the strong criterion. Therefore, when an ant exploresan area as it looks for food, it goes through the same points more than once. In both images,we can identify a circle in the center and which corresponds to the anthill. However, it onlybecomes one of the most popular places with the weak criterion. The food locations arealso visible for both criteria, they are the red region to the north-west of the anthill, the redweak region to the south-west and the blue east region. Notice that, while no informationcan be obtained from the original trajectories in Fig. 1a, once the popular places problemis solved, useful and interesting information can be extracted from Fig. 1b, c, subsequentlydemonstrating that the popular places is an interesting problem to solve.

Popular places were first studied by Benkert et al. [3]. They looked for regions that hadbeen visited by many entities, according to their trajectories. The authors model trajectorypaths as 2D polygonal lines defined by trajectory points, without considering their temporalaspect. Two different models are presented: a discrete model, where only the vertices of thetrajectory paths are considered, and a continuous model, where the entire trajectory pathsare taken into account, along with the vertices and the line segments joining the vertices oftwo consecutive time steps (see Fig. 2). According to their definition, given a set T of ntrajectory paths, each with τ vertices, r > 0, a real value modeling proximity and k > 0, aninteger value modeling popularity, a point p is an (r, k)-popular point in the discrete modelif the axis aligned square S(p, r) of center p and side length r contain the vertices from atleast k different trajectory paths of T . In Fig. 2a, point p is (r, 2)-popular because S(p, r)

contains vertices of two different trajectories, while q is (r, 1)-popular, because only thevertices of one trajectory are contained in S(q, r). In the continuous model, p is an (r, k)-popular point if there are at least k different trajectories paths of T that intersect the squareS(p, r). Accordingly, in Fig. 2a, points p and q are (r, 2)-popular points because the twotrajectories intersect either S(p, r) or S(q, r). After providing these definitions, Benkert et al.[3] define the (r, k)-popular regions as the maximal connected set of (r, k)-popular points andshow that these regions are polygons. Observe that, the complexity in solving the continuouscase is bigger than that for the discrete case as it is harder to handle because it considers

123

Page 4: Computing and visualizing popular places

M. Fort et al.

(a) (b)

Fig. 2 Discrete and continuous model examples using a square in a and a disk in b

all the points defining the trajectories instead of only its vertices. Moreover, the popularplaces obtained under the continuous model always contain those obtained in the discreteone. Finally, Benkert et al. [3] present algorithms to compute the collection of (r, k)-popularregions, Pr,k(T ). These algorithms take, for the discrete and continuous model, O(τn log τn)

and O(τ 2n2) time, respectively, and need O(τn) and O(τn + V ) space, where τn representsthe size of T and V denotes the total number of vertices of Pr,k(T ). They also argue thatit is likely that the best algorithm to solve the continuous model of the problem requires�(τ 2n2) time in the worst case. They do not provide any results of the implementation oftheir algorithms.

Benkert et al. [3] end their paper by remarking that another natural way of defining apopular place is by using a disk instead of a square and that more sophisticated techniquesare needed to handle this new problem. Note that, when proximity is measured by Euclideandistance, the results obtained using a disk are more accurate that those obtained using asquare, as can be seen in Fig. 2, where point p has popularity 1 for the case of a disk and 2for a square.

In this paper, we will focus on solving the latter problem that of detecting popular regionsin the continuous model, when proximity is determined by a disk D(p, r) of center p andradius r , instead of the square S(p, r). Moreover, to satisfy the needs of each particularapplication, we introduce weak and strong criteria for counting the number of intersectionsof a trajectory with the disk D(p, r).

At times, to reduce the complexity of the popular regions and focus on the relevantinformation, we may be interested in obtaining a schematization by simplifying the geometryof the regions. To this end, we use the skeleton [32] to obtain a good schematization of thepopular regions. The visualization methods proposed improving the understanding of theentities movement, revealing complicated structures that would be hard, if not impossible,to capture otherwise.

The increasing programmability and high computational rates of graphics processing units(GPUs) make them a compelling platform, as an alternative to CPUs, for computing demand-ing tasks. GPU architecture, which contains many processors, provides programmers with ahigh-performance parallel computational power. Programmers exploit GPU capabilities byusing the available languages and tools provided by the GPU manufacturers.

General-purpose computing on GPUs (GPGPU) is capturing the attention of researchersin many computational fields ranging from numeric computing operations and physical sim-ulations to bioinformatics and data mining [6,9,27].

In this paper, we present algorithms for computing and visualizing popular places, inthe continuous model and when proximity is determined by a disk, for both the weak andstrong criteria. As we will see in Sect. 3.1, computing the collection of (r, k)-popular regions,Pr,k(T ), in the worst case is likely to require quadratic time in the size of the set T of tra-

123

Page 5: Computing and visualizing popular places

Computing and visualizing popular places

jectories. Thus, by working toward practical solutions and to obtain good running times,we use GPU parallel computing capabilities together with computational geometry tech-niques to solve the problem. Although our proposed approach means reported solutions areapproximated, this is acceptable as the data of moving entities are also approximated. Wealso describe here the way of visualizing the popular regions obtained and their schemati-zation. To facilitate the analysis of trajectory data, our implementation allows constraints tobe added. In the input, the trajectories are shortened by considering only those time stepscontained in a selected time interval, and in the output, a minimum area or length can berequired in the popular regions or their schematization, respectively. Finally, we present anddiscuss experimental results obtained with the implementation of our algorithms.

Partial preliminary versions of this paper can be found in [10,11]. We have added to ourprevious work, the analysis of the complexity of the algorithms and the study of the errorcaused when they are performed. We also provide a wide discussion on the results achievedwith the implementation of the algorithms, of which we have run with many different datasets. We present a new method, one which runs up to 150 times faster than the previous one,for computing the schematization of the results. Finally, we have included the possibility ofadding constraints to the input data so we can see more applications of the algorithm.

2 The GPU: a general-purpose computer

There are mainly two kinds of GPU programming languages: shader languages such as Cg (Cfor graphics) and general-purpose languages such as OpenCL (Open Computing Language).Programming with shader languages allows the use of rendering and visualization hardwarefeatures directly. General-purpose languages give access to the GPU CUDA architecture byavoiding a graphics context.

In this paper, we use both kinds of languages depending on the problem to be solved.Popular regions for both criteria are obtained using the OpenGL graphics pipeline and Cg.Meanwhile, CUDA architecture is used when the schematization of the obtained popularregions is desired. Both kinds of GPU programming languages are presented in the followingsubsections.

2.1 OpenGL graphics pipeline and Cg

The OpenGL graphics pipeline [31] is divided into several stages. The input is a list of3D geometric primitives expressed as vertices defining points, lines, etc., with associatedattributes. The output is a buffer (also called image) corresponding to a two-dimensional gridwhose cells are called pixels. The grid dimension is H ×W where H and W are the height andwidth screen resolution, respectively. In the first stage of the pipeline, per-vertex operationstake place. Each input vertex is transformed from 3D coordinates to window coordinates,thus obtaining 2D primitives. The following stage (rasterization) rasterizes every primitiveobtained according to the screen resolution, and fragments, with their associated attributes,are obtained. Fragment attributes are determined from the vertices attributes by using linearinterpolation. Note that, since different primitives can be projected in the same 2D space,several fragments can belong to the same pixel position. The last stage, the fragment stage,computes the color, alpha and depth value of each fragment, thus determining the final output.

The color of each pixel is obtained by taking into account all the fragments correspondingto the given pixel. In addition, some tests are done to determine whether a fragment has tobe painted or not. The tests provided by the graphics pipeline are the scissors, alpha, depth

123

Page 6: Computing and visualizing popular places

M. Fort et al.

and stencil tests. They are executed only if they have been enabled and are processed in thisexact order.

The operations performed in the graphics pipeline conform a sequence of non-user-controlled consecutive operations. Shader languages like Cg provide the user with the abilityto modify some pre-established operations at each stage, as well as the capacity to adapt thepipeline to their needs. This is done by the so-called shaders, small programs which modifystage behavior. The most common programmable parts of the graphics pipeline are the vertex,geometry and fragment shaders.

2.2 CUDA architecture

Compute unified device architecture (CUDA) [24,25] is the computing engine in NVidiaGPUs which is designed to support various languages and application programming inter-faces. It is accessible to software developers through variants of industry standard program-ming languages like ‘CUDA c’ or OpenCL.

The main advantages of CUDA vs. shader languages are that we can read and write fromarbitrary addresses in memory and perform faster downloads and readbacks to and from theGPU. These advantages, which are not present in shader languages, make CUDA the bestoption in certain situations.

The basis of the CUDA programming model are threads. Threads are lightweight processeswhich are easy to create and to synchronize. They are executed in a parallel way and eachone computes a result element. Thus, each thread computation is independent of the otherthreads computations.

The user writes programs called kernels, which are executed in the GPU by, typically,thousands of threads. The GPU contains a number of multiprocessors consisting of a set ofindependent processors. Each processor is responsible for executing one thread. Thus, wecan imagine that all these processors are executing threads at the same time. Threads areorganized into several grids and each grid into several blocks, in order to schedule the kernelexecution. The dimensions of the grids and the blocks are defined by the user according tothe problem needs. However, dimensions have to be carefully set since they play a crucialrole in obtaining good kernel execution times.

CUDA allows users to accede to the GPU memory concurrently. Using the type of mem-ory that best suits the task, along with the correct access pattern, can provide significantperformance benefits. There are different GPU memories which are classified by their accessspeed and physical distance to the processors.

3 Basic definitions

Given E a set of n moving entities E = {e0, . . . , en−1}, the trajectory path ti is a sequenceof τ points in the plane ti : pi

0, . . . , piτ−1, where pi

j denotes the position of entity ei attime j with 0 ≤ j ≤ τ − 1 (Fig. 3a). We assume that the movement of an entity ei from itsposition pi

j to its position pij+1 is described by the straight-line segment joining pi

j and pij+1.

Consequently, the trajectory path ti is described by the polygonal line (polyline, for short)which may self-intersect and whose vertices are the trajectory points ti : pi

0, . . . , piτ−1.

A sub-trajectory path si of trajectory path ti is a trajectory path si : qi0, . . . , qi

ξ−1, such

that j ∈ [0, . . . , τ − ξ ] with point qi0 located on edge pi

j pij+1 of ti , qi

1 = pij+1, . . . , qi

ξ−2 =pi

j+ξ−2, and qiξ−1 located on edge pi

j+ξ−2 pij+ξ−1 of ti exists (Fig. 3b).

123

Page 7: Computing and visualizing popular places

Computing and visualizing popular places

(a) (b)

Fig. 3 a Trajectories of two entities and their corresponding trajectory paths. b A trajectory path ti (dashedpolyline) and a possible sub-trajectory path si of ti

Fig. 4 The number ofintersections between D(p, r)

and the set of trajectory paths is 3for the weak criterion, and 5 forthe strong criterion

t0

t1t2

p

D (p,r )

In this paper, we focus on the problem of detecting popular places among a set T ={t0, . . . , tn−1} of n trajectory paths in the continuous case. In order to model the proximitybetween a point p and the trajectory paths, a disk D(p, r) of center p and radius r is used.Moreover, to satisfy the needs of each particular application, we introduce weak and strongcriteria for counting the number of intersections of a trajectory path with the disk D(p, r)

(Fig. 4).

Weak criterion. The number of intersections between the trajectory path t and the diskD(p, r) is 1 when t intersects the disk, and 0 otherwise.

Strong criterion. The number of intersections between a trajectory path t and the diskD(p, r) is the number of maximal sub-trajectory paths contained in D(p, r).

The following definitions are valid for both strong and weak criteria.Let T be a set of n trajectory paths, r > 0 be a real value and k > 0 an integer. A point

p is an (r, k)-popular point if the total number of intersections between the disk D(p, r) ofcenter p and radius r and the trajectory paths of T is at least k.

An (r, k)-popular region is a maximal connected set of (r, k)-popular points. An (r, k)-popular region is bounded by straight-line segments and circular arcs. We denote Pr,k(T ) thecollection of (r, k)-popular regions (see Fig. 5).

For a given parameter r > 0, the popularity of a point p is the total number of intersectionsbetween the trajectory paths of T and the disk D(p, r) of center p and radius r .

The popularity map, Mr (T ), is the partition of the plane in maximal connected regionsso that all points of a region have the same popularity. Notice that, the set of points of a givenpopularity may be composed of several independent connected regions. Popularity maps arestructures from which it is easy to report popular regions.

123

Page 8: Computing and visualizing popular places

M. Fort et al.

(a) (b)

Fig. 5 Example with three trajectory paths. a When we use the weak criterion and b the strong criterion.Note that, points marked with crosses are (r, 2)-popular points. The disk in a is intersected by two trajectorypaths and in b by two sub-trajectory paths, at least. Finally, in a, we have Pr,2(T ) = {R1, R2} and in bPr,2(T ) = {R1,R2, . . . , R10}

3.1 Hardness of computation of popular regions

Benkert et al. [3] show that a special case in the problem of computing Pr,k(T ) in thecontinuous model, when proximity is determined by a square and belongs to the 3-SUM-hardclass [12]. It is not difficult to see that this also holds true, for both weak and strong criteria,when proximity is determined by a disk. Consequently, using the same argument of [3],we can conclude that the best algorithm to compute Pr,k(T ) in the continuous model whenproximity is determined by a disk, for both the weak and the strong criteria, is likely torequire �(τ 2n2) time in the worst case, where τn is the size of the set T of trajectories.This motivated us to explore an efficient alternative GPU parallel approach for solving theproblem.

4 Computing popularity maps

In this section, we explain how to compute the popularity map Mr (T ) from an arrangementof regions related to the trajectory paths in T . The regions conforming the arrangementdepend on the, weak or strong, counting criterion used.

Finding the popularity map under the weak and strong criterion leads to two different, butsimilar strategies. Both of them can be adapted to solve both problems; however, each oneuses specific properties of the obtained arrangement, making each strategy the most suitablefor the considered criterion as is shown in Sect. 8.

4.1 Computing popularity maps under the weak criterion

For each edge eij = pi

j pij+1 of the polyline representing trajectory path ti , we consider the

offset region Or (eij ) obtained by sweeping along ei

j with a disk of radius r such that the

center moves on eij . We can represent the region Or (ei

j ) as a rectangle and two semidisks,

as shown in Fig. 6a. We denote Or (ti ) the union of the regions Or (eij ), 0 ≤ j ≤ τ − 1,

determined by the edges of the trajectory path ti (Fig. 6b).

123

Page 9: Computing and visualizing popular places

Computing and visualizing popular places

(a) (b)

Fig. 6 a The offset region Or (eij ). b The region Or (ti )

(a) (b)

Fig. 7 a Weak popularity map. b Discretization of the weak popularity map

Observe that, when a point p ∈ Or (ti ) then D(p, r) ∩ ti �= ∅. Thus, the popularity mapMr (T ) can be obtained from the arrangement of regions Or (ti ), 0 ≤ i ≤ n − 1, since thenumber of regions in the arrangement containing a point coincides with the popularity of thepoint.

To compute Mr (T ), we will discretize the plane into H × W points, such that each pointcorresponds to a pixel in the graphics pipeline. The idea is to paint each region Or (ti ) inthe GPU as a set of rectangles and disks. Inside the pipeline, all primitives will be rasterizedinto fragments according to the screen resolution H × W . Then, we will store the number offragments corresponding to each pixel. In this way, we obtain the popularity of each pixel,and consequently, a discretization of the popularity map Mr (T ) (Fig. 7).

OpenGL has functions to render points, lines, triangles and rectangles; however, since itdoes not have a function to render disks, we have to render a disk with Cg using a fragmentshader. The disk D(pi

j , r) is painted as a square centered at pij of side 2r . Then, a fragment

shader is activated so that fragments whose Euclidean distance to pij is bigger than r are

discarded.We use the stencil and depth tests to count the number of fragments correspond-

ing to a pixel and the stencil buffer to store them. Thus, we have to deal with someOpenGL pipeline issues, enable the stencil test and configure it. Note that, two differ-ent fragments of the same trajectory path can correspond to the same 2D space, that is,to the same pixel, and with the weak criterion they should be counted once only. Usingthe default configurations of the stencil and depth test, this pixel will be counted twice.

123

Page 10: Computing and visualizing popular places

M. Fort et al.

Fig. 8 Trajectory paths arerendered with different depthfrom z = 1 to z = 0

This can be solved by avoiding painting the auto-intersection points of the primitives,but this is not just difficult to compute, but also very susceptible to errors. What wedo instead, is to paint all the rectangles and disks of Or (ti ) in a plane parallel to theXY -plane at a fixed depth value, and configure the depth test to only pass those frag-ments whose depth value is strictly smaller than the already stored one. In this way,fragments which coincide with any other fragment of the same trajectory path will havethe same depth value, so that only the first fragment will pass the depth test and thefollowing ones will be discarded. This results in pixels covered by region Or (ti ) beingcounted once, even if several fragments belong to the same pixel. Note that, the stenciltest is configured to increment its stencil pixel value only if the fragment passes the depthtest.

Thus, we paint each region Or (ti ) at a different depth value z = zi from further (z = 1)to closer (z = 0) (see Fig. 8). This means that fragments of different trajectory paths willalways pass the depth test thus incrementing its stencil pixel value. As a result, the stencilbuffer stores, for each pixel, the number of different trajectory paths that are at a distancesmaller than r from a given pixel, that is the discretized Mr (T ).

There are some restrictions to the graphics hardware we have to deal with. Stencil anddepth buffers are 8 and 32 bits precision buffers, respectively. This means that the buffer canoverflow whenever we overtake the range of [0 . . . 28 − 1] for the stencil or [0 . . . 232 − 1]for the depth buffer. If we do not take into account these constraints whenever we have apopularity point with popularity bigger than 255 or if we render more than 232 trajectories,the buffers will overflow and the algorithm will report incorrect results.

If we work with data sets with thousands of trajectory paths, it will not be unusual to lookfor regions visited by more than 255 entities. Consequently, we paint from further to closergroups of 255 trajectory paths to ensure that the stencil will never be overflowed. For eachpainted group, we read the stencil buffer values back to a CPU and the values are accumulatedin a CPU array of size H × W . Then, we clean the stencil and depth buffer (all values are setto 0) and we restart the process, until all entities have been painted. Despite the GPU–CPUcommunication being an expensive operation, we simply have to read from GPU to CPUn/255 times, where n is the number of trajectory paths.

Note that, we also clean the depth buffer for every painted group. That is, we clean thedepth buffer every 255 trajectories, far from its 232 precision constraint. This approach avoidsthe depth precision buffer issue and removes any constraint on the number of entities in theinput data.

123

Page 11: Computing and visualizing popular places

Computing and visualizing popular places

Weak popularity map computation overview

The summary of the main steps for computing the discretized weak popularity map is asfollows:

1. Initialize the depth and stencil buffers to 0 and set the stencil to be incremented if afragment passes the depth test.

2. Paint each Or (ti ) region at different depth.3. For every 255 painted regions, read the stencil values, accumulate them in a CPU array

and set the depth and stencil buffers back to 0 again.

4.1.1 Complexity analysis

Between all the processes that take place in an OpenGL algorithm, we have to take intoaccount the following considerations: (1) the time needed to process a fragment is smallcompared to the one needed to access a texture or copy information to a texture, and (2) thetime needed to read a pixel from GPU to CPU takes much more time than the others. Thenotation used for the complex analysis is the following:

We denote by Pf the time needed to render and color f fragments, A f the time spent tomake f accesses to a texture, C f that needed to copy f pixels from texture to texture, andfinally, the time needed to transfer f pixels from GPU to CPU is denoted by η.

We paint up to 2nτ squares of side 2r covering the disks centered on the trajectorypath vertices. The painted area is 8nτr2 corresponding to 8nτr2 H W pixels. In order topaint the rectangles covering the trajectory paths line segments, we paint rectangles of side2r . Denoting L the sum of all the trajectory paths length, this represents 2Lr H W paintedpixels. Thus, the total number of painted pixels is O(P(8nτr2+2Lr)H W ). On the other hand,every 255 trajectory paths we read the stencil buffer back to CPU and so, during the wholeprocess, we read back to CPU H W n/255 pixels, representing an extra O(ηH W n/255) time.Consequently, the algorithm time complexity is of O(P(8nτr2+2r L)H W + H W n/255 η) andO(2H W ) additional space needed to maintain the stencil buffer.

4.2 Computing popularity maps under the strong criterion

When we use the strong criterion to count intersections, we must deal with the edges of atrajectory path, instead of the whole trajectory path. Given a value of r , we define Pr (ei

j ) as

Pr (eij ) = Or (ei

j ) if j = 0 and Pr (eij ) = Or (ei

j )\D(pij , r), otherwise. These regions can be

represented using a rectangle and two semidisks (Fig. 9). Note that, for 1 < j < τ − 1, oneof the semidisks is subtracted.

Observe that for a given trajectory path ti , the number of regions Pr (eij ), 0 ≤ j ≤ τ − 1,

containing a point p equals the number of maximal sub-trajectory paths of ti contained inD(p, r) or, in other words, the number of intersections between the disk D(p, r) and thetrajectory path ti under the strong criterion. In consequence, the popularity map Mr (T ) canbe obtained from the arrangement of regions Pr (ei

j ), 0 ≤ j ≤ τ − 1, 0 ≤ i ≤ n − 1,since, again, the number of regions of the arrangement containing a point coincides with thepopularity of the point. Notice that, the intersection between the regions Pr (ei

j ) and Pr (eij+1)

determined by two consecutive edges eij , ei

j+1 of trajectory path ti defines a region such thata disk centered in any of their points and radius r intersects ti in two maximal sub-trajectorypaths, one contained in ei

j and the other in eij+1, and therefore, the number of intersections

between ti and the disk is two (Fig. 10a).

123

Page 12: Computing and visualizing popular places

M. Fort et al.

Fig. 9 a The region Pr (eij ) for

j = 0. b The region Pr (eij ) for

1 < j < τ − 1

(a) (b)

(a) (b)

Fig. 10 a Strong popularity map. b Discretized strong popularity map

To compute Mr (T ), we use the same idea as the one used in the case of the weak criterion.We discretize the plane into H × W points (one per pixel), and then, we paint the regionsPr (ei

j ) of the arrangement as a set of rectangles and disks (Fig. 10b).

We should paint each Pr (eij ) with a different depth value so that the stencil value is

incremented for each different region. This way, a stencil value will also be incrementedwhen fragments of different time steps coincide, even if they correspond to different timesteps of the same trajectory path. The main drawback of this approach is that there is a hugenumber of data transfers from GPU to CPU. In the weak criterion, to avoid overflows in thestencil buffer, we read back the stencil values to the CPU every 255 trajectory paths, whilein the strong criterion, we will have a read back every 255 time steps. Since the number oftime steps may be really big, this makes the algorithm useless.

In order to avoid these slow data read backs from GPU to CPU, we propose a new methodbased on the implementation of a stencil buffer in a fragment shader, that is, modifying thegraphics pipeline behavior by using Cg shaders. The idea is to paint each time step with adifferent depth value and inside the fragment shader increment, for each fragment arrivingin the pipeline, the color value of its pixel. When all the time steps have been painted, wecan determine how many fragments correspond to each pixel, depending on its final color.

Since we have 3 channels of color (RGB) with 8 bits per channel [0 . . . 255], for eachfragment arriving in the fragment shader pipeline, the shader will increment by 1, the colorvalue of the red channel. When the red channel is overflowed (255), the green channel isincremented by 1, and the red channel is restarted at 0. That is, the green channel indicateshow many times the red channel has overflowed. The same process is applied to the blue

123

Page 13: Computing and visualizing popular places

Computing and visualizing popular places

Fig. 11 The Pr (eij ) regions

rendered from z = 1 to z = 0

channel when the green channel is overflowed. By recovering RG B final values, we knowhow many fragments correspond to a given pixel.

The algorithm proceeds as follows. We paint each region Pr (eij ) at a different depth value

z = zij from further (z = 1) to closer (z = 0) (Fig. 11). The Cg fragment shader is used

throughout the process, not just for adding the color values for all fragments, but also to painta disk when needed, as explained in Sect. 4. Since we have a depth precision of 32 bits, wecan paint up to 232 time steps at different depth levels.

Notice that, each disk corresponding to pij for 1 < i < τ − 2 is painted twice, once for

each trajectory path segment it defines, and then with different depth values. This causesthe color value to be incremented twice where it should be counted only once. In addition,the overlapped regions between rectangles and disks must also be avoided. To solve thisproblem, we add a parameter to the fragment shader to know whether the disk fragments haveto increment the color or only modify the depth value. The fragment shader updates the depthvalue, but does not increment the color when painting the first disk. When the rectangle andthe second disk are painted, both color and depth values are updated. Proceeding in this way,the overlapped fragments between the two disks along with the rectangle will be discardedby the depth test and the disks painted twice are counted just once.

When the number of time steps is smaller than 232 and the popularity of a point is smallerthan 224, no read backs to CPU have to be done. This turns the number of read backs fromGPU, to zero, in most applications.

The main difference between the two proposed algorithms concerning weak and strongcriteria is that the first one needs more readbacks to the CPU, and the second one, in mostcases, needs no readbacks, and instead of using the stencil buffer, we simulate it with a Cgfragment shader. Despite the stencil buffer being faster than a shader at counting fragments,the relation ‘stencil speed’–‘read backs’ is slower than ‘shader speed’–‘no readback’. Thus,the strategy used to compute Mr (T ) depends generally on the counting criterion. That is,for the weak criterion, it is better to use the stencil counting technique, whereas the strongone is faster to compute using the shader approach.

Strong popularity map computation overview

The summary of the main steps for computing the discretized strong popularity map is asfollows:

1. Initialize the depth buffer and the three color channels (red, blue, green) of the colorbuffer to 0.

123

Page 14: Computing and visualizing popular places

M. Fort et al.

2. Paint each Pr (eij ) region at different depths.

3. Use a fragment shader to accumulate the color in the buffer, depending on whether thefragment passes or not, the depth test.

4. For every 224 painted regions, read the color values, accumulate them in a CPU arrayand reset the depth and color buffers to 0.

4.2.1 Complexity analysis

The number of painted disks for each trajectory path is 2τ , from which, only τ disks activelyaccess texture values to update the color. They represent 8τr2nH W painted pixels andτ4πr2 H W texture accesses. In the case of the rectangles, both the number of accesses to atexture and pixels painted is 2Lr H W , where L is the sum of all the trajectory paths length.Finally, the information from texture B is copied to texture A a total of τ times per trajectorypath, providing H Wτn copied values. Thus, by using the notation presented in Sect. 4.1.1,the time complexity is O(P(8τr2n+2Lr)H W + A(n4πr2τ+2Lr)H W +Cnτ H W ) and no extra spaceis needed.

4.3 Error analysis

The discretzation of the Or (ti ) regions into pixels leads to a precision error due to therasterization process of the GPU pipeline. A pixel partially covered by Or (ti ) can be painted,or not, depending on the rasterization criterion. According to Segal et al (1994), a pixel isconsidered part of a region if the geometry of the region covers the center of the pixel. Inour case, a pixel will be considered part of a region, and consequently painted, if any Or (ti )covers the center of the pixel.

The error occurs when a popular point relies within a not painted pixel, or when a notpopular point relies within a painted pixel. In both cases, the maximum error produced isd , where d is half of the pixel diagonal. It is not difficult to see that d increases whenthe covered area of the input data increases, and decreases when the screen resolutionincreases.

Given a desired error, we can determine the discretization required, according to thecovered area dimensions. In the case that the required discretization is not supported by theGPU, the covered area is subdivided into several smaller suitable regions and the algorithmsare run independently, in each of them. However, in general, resolutions much smaller than theGPU bounds, which nowadays are 16384×16384, are adequate enough for real applicationsto guarantee accurate results.

5 Popular regions computation

An (r, k)-popular region of Pr,k(T ) is a maximal connected region composed of regions onthe popularity map Mr (T ) whose points have popularity at least k.

A discretization of Pr,k(T ) can be easily obtained from the discretized Mr (T ). We storethe discretization in a 2D array AP of the same dimension, H × W , as the array AM thatstores the discretized popularity map. We create AP according to the following criterion:AP(i, j) = 1 if AM(i, j) ≥ k and AP(i, j) = 0, otherwise. The resulting array repre-sents a binary image corresponding to the discretization of the collection of (r, k)-popularregions.

123

Page 15: Computing and visualizing popular places

Computing and visualizing popular places

5.1 Popular regions visualization

The visualization of popular regions provides important reasoning mechanisms to improvethe understanding of entities movements, giving to end users images from which they areable to get a quick idea of the situation [2].

We could visualize Pr,k(T ) as a binary image where those pixels whose Pr,k(T ) value issmaller than k are painted black, while the others are painted white. Alternatively, in orderto obtain better visual information, we visualize Pr,k(T ) from the discretized Mr (T ), wherepixels of Mr (T ) with values smaller than k are painted white, and the remaining pixels arepainted according to their popularity. Then, instead of having just two colors, we uniformlydistribute the popularity range values among the whole RGB range (red, green and blue). Inparticular, when k = 1, we obtain the visualization of the popularity map Mr (T ).

Figures 1 and 12 show examples of where a gradient of three colors is used. Regions ofsmall popularity values are painted in yellow, middle values in blue and the most popularones in red. Note that, as expected, differences between using the weak or strong criteria areobtained. Consequently, before starting to solve a specific problem in a realistic situation, wehave to take care to choose the appropriate criterion according to the different informationthey report and to the problem we are dealing with.

In the example of Fig. 1, a group of ants look for food on a region. When an ant findsfood, it takes a piece of food and transports it to the anthill, leaving a trail which is followedby the rest of the ants. Ant movement between the food and the anthill can be appreciated inboth Fig. 1b, c. When ants do not find food, they move randomly over the region looking forany place with food. No ant can return to the anthill without food. Note that, when using theweak criterion, the most popular place is the anthill, which is located in the center. This isreasonable, because at the beginning, every ant has visited the anthill at least once, on theirway out to look for food. So, any other region could be, at most, as popular as the anthill.By contrast, using the strong criterion, we appreciate that the most popular region is not theanthill, despite having been visited at least once and sometimes twice or more by those antsthat have found food. This leads to the conclusion that ants have spent more time looking forfood, rather than finding food.

Another example is shown in Fig. 12. It represents a group of people moving in a citywith perpendicular streets. Imagine that we want to know the most suitable location for apollster. Obviously we want him to poll as many people as possible, so we need to knowthe busiest area. The information provided from the Pr,k(T ) visualization using the weakcriterion (Fig. 12a) shows that the most popular places are the two street intersections inthe upper-left corner. On the other hand, the information reported when we use the strongcriterion (Fig. 12b) shows that the most popular region is the second street starting from theright. The street intersections marked as more popular when we use the weak criterion arethe places visited by many different people. However, while the most popular street with thestrong criterion is a very busy street, it may be being visited by the same people many times.Since a poll is taken only once per person, we are interested in the popular regions reportedwhen the weak criterion is used.

6 Popular region schematization

Sometimes, it may be interesting to obtain a schematization of the popular regions by sim-plifying their geometry, in order to reduce complexity and to focus on relevant information.

We will use the skeleton of the discretized Pr,k(T ) to schematize popular regions.

123

Page 16: Computing and visualizing popular places

M. Fort et al.

Fig. 12 Visualization of Pr,k (T ) under a weak and b strong criteria, respectively

6.1 The skeleton of a multi-object binary image

Let � be a multi-object binary image in a 2D discrete space D and δ� its boundary. Thefeature transform assigns to each pixel p ∈ D the set F(p) = arg minq∈δ�{d(p, q)} of itsnearest boundary pixels under the Euclidean distance. A simple feature transform computesonly one nearest feature pixel S(p) per pixel p, which is sufficient for many applications.

The skeleton or medial axis [32] is defined as the locus of pixels at equal distance from atleast two boundary pixels:

S(δ�) = {p ∈ D | ∃ q, r ∈ δ�, q �= r : d(p, q) = d(p, r)}.Given a pixel p = (i, j) ∈ D, we will denote Si, j = S(p). Once, {Si, j | ∀(i, j) ∈ D} is

computed, we can quickly generate an approximated skeleton as:

˜S(δ�) = {(i, j) ∈ D | max{d2(Si+1, j , Si, j ), d2(Si, j+1, Si, j )} > λ}where λ is a given fixed parameter that determines when a transition of the nearest featurepixel between adjacent pixels is big enough to be considered an skeleton pixel.

6.2 Computing the skeleton

In order to obtain an approximated skeleton ˜S(δ�) of the discretized Pr,k(T ), the collectionof (r, k)-popular regions, we: (1) extract the contour of Pr,k(T ); (2) compute the simple fea-ture transform S(δ�) of the extracted contour; and (3) generate the approximated skeletonfrom S(δ�). We implement these three steps inside the GPU by using OpenCL over CUDAarchitecture, without any readback to the CPU except for the final result ˜S(δ�).

Extracting the contour

Many methods have been proposed in order to extract the contour of a binary image [14].A well-known approach for detecting contours is image filtering. There are many possiblefilters and each one is determined to be appropriated, or not, depending on the features ofthe entry image. Since our input image, Pr,k(T ) is a base case of contour detection (totallybinary and noise free), we will apply a derivative filter. For each pixel, the (1,−1) filter isapplied on x and y so that the transition between regions of 0’s to regions of 1’s is detected.

123

Page 17: Computing and visualizing popular places

Computing and visualizing popular places

Note that, we are shifting each contour pixel a half pixel right and a half pixel down sincethe real transition from 0 to 1 comes between the two pixels.

We compute the contour with an OpenCL kernel by assigning a thread to each pixel. Then,using the derivative filter, we determine whether each pixel is a boundary point. In this way,we get a binary image that we then store in a 2D array AC of dimension H × W , so thatAC(i, j) = 1 if (i, j) is a contour pixel and AC(i, j) = 0, otherwise.

Computing the simple feature transform

The computation of the simple feature transform S(δ�) is an expensive operation, becausefinding the nearest contour pixel of each one of the H × W pixels of the discretized planeis required. However, since the proximity of one pixel to the contour pixels is independentof the others, we compute the single feature transform by using OpenCL kernels whichsimultaneously compute all nearest contour pixels for each pixel.

To efficiently compute S(δ�), we make use of the ideas described in [20], to solve nearestneighbor queries. First, a uniform grid over the set of contour pixels (that can be obtainedfrom array AC) is built, and next, the grid is used to simultaneously search for the nearestcontour pixel of the H × W pixels on the discretized plane. Both parts, the grid constructionand the nearest neighbor obtained, are done in parallel using OpenCL. We store the simplefeature transform obtained, in a 2D array AS of dimension H × W .

Generating the approximated skeleton

The approximated skeleton ˜S(δ�) is obtained as a distance calculation between the sim-ple feature transform of adjacent pixels thresholded by parameter λ. We store the skele-ton in a 2D array AK of dimension H × W , so that AK (i, j) = 1 if max{d2(AS(i +1, j), AS(i, j)), d2(AS(i, j + 1), AS(i, j))} > λ and AK (i, j) = 0, otherwise. Again, thisoperation can be done using an OpenCL kernel where each thread computes a pixel value.Then, the result is read back from GPU to AK .

Note that, the three steps to compute the skeleton are executed entirely inside the GPU.We write Pr,k(T ) in GPU memory in the first step, and we read back AK to the CPU in thefinal step. This is a high priority practice for any GPU algorithm.

6.2.1 Complexity analysis

The complexity analysis of algorithms that use CUDA not only give the total work of analgorithm, which represents the total number of operations done in order to solve the problem,but also give the step complexity, which is the time that the algorithm would take if we hadan infinite number of parallel processors (threads). We also reflect the number of accessesto memory and the amount of information transferred. The time needed to read f and writeg values from global memory is represented by G f +ρg , when shared memory is used it isdenoted by S f +ρg , and finally, T f +gρg represents f transferred values from CPU to GPUand g transferred values from GPU to CPU.

The time complexity of this part of the algorithm is the sum of computing the derivativefilter, finding the simple feature transform and, finally, obtaining the thresholded distance.

Computing the derivative filter for the H W pixels has a complexity of O(H W + TH W +G4H W+H Wρ). For the simple feature transform by using a grid to find the nearest neighbor,we denote S f , S f t , S f w and S f r the total work, the number of transferred values to theGPU and the written and read values from global memory, respectively. Thus, the second

123

Page 18: Computing and visualizing popular places

M. Fort et al.

Fig. 13 a A set of ants trajectory paths, b their weak popularity map and c the corresponding skeleton

step complexity is O(S f + TS f t + GS f r +S f wρ). Finally, obtaining ˜S(δ�) yields a O(H W +G4H W+H Wρ) complexity. Summing them all up, the algorithm that computes the skeletonhas complexity O(H W + S f + TH W+S f t + G H W+S f r +(H W+S f w)ρ).

6.3 Popular region schematization visualization

The skeleton, used to schematize the popular regions, can be visualized using the reportedarray AK , which is a binary image. We initialize the stencil buffer with AK values and thecolor buffer with white. We enable the stencil test and set it to pass only those fragmentswhose stencil value is equal to 1. Next, we paint a colored rectangle covering the wholescreen, only the pixels representing the skeleton of Pr,k(T ) pass the stencil test and arecolored (see Figs. 14c, 13c).

The skeleton is specially useful in aiding the understanding of the information contained inpopular regions. In this case, we can differentiate between two different scenarios dependingon whether the movement of entities is subject to some constraints.

6.3.1 Not constrained movement

The skeleton can be used when entities have no movement constraints, thus providing aschematization of the most popular region. In Fig. 13, again we can see the ant trajectorypaths, their weak popularity map and the corresponding skeleton. Notice that, in this case,the skeleton provides schematic and interesting information compared to the set of trajectorypaths.

6.3.2 Constrained movement

When the movement of entities is subject to some constraints, as in the road network case,the skeleton provides a good schematization of the popular regions. Figure 14 shows thetrajectory paths of a real set of cars moving on a road network, their popularity map and theirskeleton. For the specific case of trajectory paths on road networks, whenever the samplingrate of the input data is dense enough to allow the trajectory paths follow the roads, theskeleton tends to match with the most popular roads.

123

Page 19: Computing and visualizing popular places

Computing and visualizing popular places

Fig. 14 a A set of real lorry trajectory paths, b their weak popularity map and c the corresponding skeleton

Notice that, in the special case of trajectories on road networks, the problem we solve iscompletely different to that of the previously presented work, which includes (a) hot routes,where general paths followed by multiple moving objects are detected [21,30]; (b) trajectorypatterns, providing a sequence of spatial regions frequently visited in the order specified bythe sequence [13]; and (c) interesting places in trajectories allowing the stops and movessemantic analysis [34]. The problem we solve identifies popular places, which are placesthat have been visited many times; however, we do not differentiate unreported movementsin opposing directions, nor do we maintain the order in which the places are visited. Popularplaces can be used, for instance, to determine which parts of the city are noisier or morepolluted.

7 Additional constraints

To improve the understanding of trajectory path data, our algorithms handle constraints onboth, the input trajectory paths to be analyzed and the extracted popular places.

As an input constraint, we consider the time aspect of the initial trajectories by assigningspecific time slots to the analysis of the popular places. The user can select a time interval(Fig. 15), and the corresponding popular places are computed by considering only the sub-trajectories contained within the given time interval, with their associated sub-trajectory paths.By considering different time slots, we are able to determine how popular places change withtime. Notice that, this additional constraint is the only moment when the trajectories timeaspect is collaterally taken into account, because the problem tackled here does not considerthe trajectories time aspect.

123

Page 20: Computing and visualizing popular places

M. Fort et al.

Fig. 15 a A real data set of trajectory paths tracking Athens school buses, b the weak criterion popularitymap between 0:00 and 6:00 h and c between 8:00 and 15:00 h

Fig. 16 a Popular regions for the ant data set. b The resulting output after applying the area constraint.c Visualization of Pr,k (T ) skeleton for the ‘State fair’ data set and d the resulting output after applying thelength constraint

As an output constraint, given that popular places can be excessively small, the user canrequire a minimum area/length for the popular regions/paths to be visualized (Fig. 16). Thisis specially useful when we use the strong criterion where, as shown in Fig. 5b, some reportedareas/paths are very small/short because of the small angle between trajectory path edgescorresponding to two consecutive time steps.

123

Page 21: Computing and visualizing popular places

Computing and visualizing popular places

Table 1 Real data sets details information

Data set n τ Sam. rate (s) Area (m2) Error (m)

512 × 512 1024 × 1024

Busses 145 66,096 30 25 × 106 6.90 3.45Trucks 276 112,203 10 400 × 106 27.62 13.81Cars 340 87,031 30 18.2 × 106 12.56 6.28State Fair 2 5,861 30 2.29 × 106 1.66 0.83

8 Experimental results

The aim of this section is to test our algorithms from a running time efficiency point of view.We have implemented an application to change the parameters of our problems easily andto check the output. Tests have been computed in an Intel Core 2 CPU 2.13GHz, 2GB RAMand a GPU NVidia GeForce GTX 480.

The methods presented for computing and visualizing popular places have been testedunder the following data sets.

– Buses and Trucks: A set of school buses and trucks moving in the Athens metropolitanarea extracted from [33]. The data have been obtained with the WGS84 reference systemfor latitud and longitud.

– Cars: Real trajectory paths tracked with a GPS placed in a set of cars moving around thearea of Girona, Spain.

– State Fair: A daily GPS track collected from human movement in the NC State Fair heldin North Carolina [7]. Positions are stored as the distance to a reference system locatedin the surrounding area.

– Traffic and animal simulations: Two synthetic data sets generated with Netlogo [36].The Traffic simulation is a basic simulation of entities moving along streets where eachentity moves one block per time step and at each crossroad a new random direction ischosen. Animal simulation simulates animals moving on a terrain with no movementrestrictions.

– Soccer simulation: A synthetic data set containing the simulation of soccer playersduring a match. The positions are stored as the distance to an owned reference systemlocated in the middle of the soccer field.

In Table 1, detailed information of the real data sets is presented. The number of entities,the total number of time steps, the sampling rate, the area covered and the error produced fordifferent screen resolutions are shown for each real data set. Note that ‘Traffic and animalsimulation’ and ‘Soccer simulation’ data sets are not present in the table because they aresynthetic data sets and it makes no sense to report the sampling rate or the covered area. Thepurpose of these simulations is to test the algorithm with data sets containing a large numberof entities and time steps, due the difficulty in finding data sets which are real and large.Traffic and animal simulations have 1,000 and 10,000 entities, respectively, with a total of30,000 time steps for the traffic simulation and 200,000 for the animal simulation. In the‘Soccer simulation’ data set, we simulate soccer players during a match obtaining a total of4.14 × 106 time steps.

The times presented are total running times, including the transferring times from and tothe GPU, the initialization times, etc. Each running time reported has been computed as the

123

Page 22: Computing and visualizing popular places

M. Fort et al.

Table 2 Computation times reported, when applying the methods presented to compute Mr (T ) under thedifferent counting criteria

T r Mr (T ) Computation time (s)

Weak Strong

Stencil Shader Shader Stencil

State fair 2 0.006 0.011 0.441 0.731

5 0.006 0.011 0.439 0.721

10 0.006 0.010 0.439 0.721

Animal sim. 2 0.201 0.562 12.850 22.888

5 0.201 0.561 12.852 22.889

10 0.202 0.563 12.850 22.902

average of 10 executions of the same parameter values. We cannot compare our executiontimes against others because, to the best of our knowledge, no other implementations exist.Benkert et al. presented in [3] an approach to report popular places, but their proposal is froma theoretical point of view. No running times are reported, and, as far as we know, no otherpapers about this particular pattern have been published.

In Sect. 4, we have presented two approaches for computing Mr (T ), depending on thecounting criterion used. Table 2 shows the total running times needed to compute Mr (T )

with the algorithms using each approach for both criteria. ‘Stencil’ refers to the algorithmpresented for the weak criterion, which uses the stencil buffer to count the fragments, and‘Shader’ refers to that presented for the strong criterion, which uses a shader simulating thestencil. We can see that, as stated in Sect. 4, ‘Stencil’ is the best option for the weak criterionand ‘Shader’ for the strong criterion.

Figures 17 and 18 show computational and visualization running times, for different k andr values. Figure 17 provides the running times when using a screen resolution of 512 × 512and Fig. 18, for a 1024 × 1024 screen resolution. Even though the biggest discretization sizesupported by the GPU is 16384 × 16384, we have chosen these resolutions because, as wesee in Table 1, the error produced is acceptable.

In both, Figs. 17 and 18, a, b present the time in seconds needed to compute Mr (T ) anda specific Pr,k(T ) for the weak and strong criteria, respectively. Note that, the running timesare fairly affected by the different r and k input values, showing a good scalability in theseparameters. We have provided the total time to compute Mr (T ) and extract one Pr,k(T ),however, for a fixed r and a given data set, obtaining Pr,k(T ) from Mr (T ) is really fast. Forinstance, for the ‘Animals Sim.’ data set, once the M5(T ) is computed, it takes 8.81, 9.32 and9.11 ms to compute the Pr,k(T ) for k = 2, k = 5 and k = 10, respectively. Comparing bothcriteria, we can see how, as expected, the computational times are bigger under the strongcriterion, especially when the number of time steps increases. For instance, in the animalsimulation data set, the computational times are up to six times faster under the weak ratherthan the strong criterion. That is because the algorithm directly depends on the number ofpainted fragments so, the more time steps, the slower the strong criterion computation. Wealso tested the strong criterion algorithm with the soccer data set. With a 1024 × 1024 screenresolution, it took 302.78 s to compute the Mr (T ). These experiments show the scalabilityof our algorithm, which has no limits in the input data. As for the number of time steps, thesets provided range from 5, 861 to 4.15 × 106 time steps.

123

Page 23: Computing and visualizing popular places

Computing and visualizing popular places

Fig. 17 Running times for a 512 × 512 screen resolution

The times in seconds needed to compute the skeleton from an already computedPr,k(T ) areshown in Figs. 17 and 18c, d. The skeleton computation is not affected by the input parameters,nor by the number of entities. In fact, times differ without showing any correlation between

123

Page 24: Computing and visualizing popular places

M. Fort et al.

Fig. 18 Running times for a 1024 × 1024 screen resolution

123

Page 25: Computing and visualizing popular places

Computing and visualizing popular places

r , k, the number of entities or time steps. This is because it depends on the distributionof the boundary points of Pr,k(T ), which is mainly related to trajectory path distribution.When comparing images (c) and (d) in Figs. 17 and 18, we can see that the only affectedtimes, when increasing the screen resolution, are those of the animal and traffic simulation,especially with r = 50 and k = 5.

Finally, in Figs. 17 and 18e, f, Pr,k(T ) and skeleton visualization times in millisecondsare provided, respectively. The visualization of Pr,k(T ) depends on the range of popularityvalues obtained and on screen resolution. However, the visualization times are really small.

9 Conclusions

In this paper, we have studied the problem of finding popular places, regions that are visitedby at least a given number of entities. We have provided two models, depending on whetherknowing the number of different entities that have visited a place or knowing the total numberof times the different entities have visited the place, is required. We have also introduced thenotion of the popularity map, a planar structure from which it is easy to report popular places.By working toward practical solutions and in order to obtain good running times, we havepresented algorithms, to report popular regions and their schematization, that take advantageof the parallel computing capabilities of the graphics processing units. The algorithms pre-sented have no restrictions in either the number of entities or times steps, and let the userspecify a desired maximal error in the solutions obtained. Although our proposed approachmeans reported solutions are approximated, this is acceptable since data of moving entitiesare approximated too. We have also described how we can visualize and obtain a schemati-zation of the reported solutions which in turn facilitates the understanding of trajectory pathdata. To allow a deeper study of the trajectory sets, our implementation allows constraints tobe added on both the input and output paths. In the case of the input trajectory paths, ratherthan considering whole trajectories, only those time steps in a specified time slot are takeninto account, and for the output, this is done by deleting those parts of the extracted popularplaces which are considered too small. Finally, we have reported and discussed experimentalresults obtained with our implementation of the algorithms presented showing the efficiencyand scalability of our approach.

Acknowledgments We thank the reviewers for their suggestions and comments. Authors are partially sup-ported by the Spanish MCI grant TIN2010-20590-C02-02.

References

1. Andersson M, Gudmundsson J, Laube P, Wolle T (2008) Reporting leaders and followers among trajec-tories of moving point objects. GeoInformatica 12(4):497–528

2. Andrienko, GL, Andrienko NV (2008) A visual analytics approach to exploration of large amounts ofmovement data. In: Sebillo M, Vitiello G, Schaefer G (eds) VISUAL, lecture notes in computer science,vol 5188, Springer, pp 1–4

3. Benkert M, Djordjevic B, Gudmundsson J, Wolle T (2010) Finding popular places. Int J Comput GeomAppl (IJCGA) 20(1):19–42

4. Benkert M, Gudmundsson J, Hübner F, Wolle T (2008) Reporting flock patterns. Comput Geom41(3):111–125

5. Bogorny V, Shekhar S (2010) Spatial and spatio-temporal data mining. In: Proceedings of IEEE interna-tional conference on data mining ICDM’2010, p 1217

123

Page 26: Computing and visualizing popular places

M. Fort et al.

6. Coll N, Fort M, Madern N, Sellarès JA (2007) Multi-visibility maps of triangulated terrains. Int J GeogrInf Sci 21(10):1115–1134

7. Dartmouth College (2008) CRAWDAD. http://crawdad.cs.dartmouth.edu/. Accessed Apr 20138. Dodge S, Weibel R, Lautenschutz AK (2008) Towards a taxonomy of movement patterns. Inf Vis 7(3–

4):240–2529. Fang W, Lu M, Xiao X, He B, Luo Q (2009) Frequent itemset mining on graphics processors. In: DaMoN

’09: proceedings of the fifth international workshop on data management on new hardware. ACM, NewYork, NY, USA, pp 34–42

10. Fort M, Sellarès JA, Valladares N (2010) Computing popular places using graphics processors. In: Pro-ceedings of SSTDM’10 in cooperation with IEEE ICDM’10, IEEE Computer Society, pp 233–241

11. Fort M, Sellarès JA, Valladares N (2010) Computing popularity maps with graphics hardware. In: Pro-ceedings of the 27th European workshop on computational geometry, pp 233–240

12. Gajentaan A, Overmars M (1995) On a class of O(n2) problems in computational geometry. ComputGeom Theory Appl 5:165–185

13. Giannotti F, Nanni M, Pedreschi D, Pinelli F (2007) Trajectory pattern mining. In: Proceedings of 13thACM SIGKDD, Sant Jose, California, USA, pp 330–339

14. Gonzalez RC, Woods RE (2008) Digital image processing, 3rd edn. Prentice-Hall, Inc., Englewood Cliffs,NJ

15. Gudmundsson J, van Kreveld M, Speckmann B (2004) Efficient detection of motion patterns in spatio-temporal data sets. In: Pfoser D, Cruz IF, Ronthaler M (eds) GIS. ACM, New York, pp 250–257

16. Gudmundsson J, van Kreveld MJ (2006) Computing longest duration flocks in trajectory data. In: de ByRA, Nittel S (eds) GIS. ACM, New York, pp 35–42

17. Gudmundsson J, van Kreveld MJ, Speckmann B (2007) Efficient detection of patterns in 2D trajectoriesof moving points. GeoInformatica 11(2):195–215

18. Laube P, Imfeld S, Weibel R (2005) Discovering relative motion patterns in groups of moving pointobjects. Int J Geogr Inf Sci 19(6):639–668

19. Laube P, van Kreveld M, Imfield S (2004) Finding REMO—detecting relative motion patterns in geospatiallifelines. Developments in spatial data handling: 11th international symposium on spatial data handling,pp 201–215

20. Leite PJS, Teixeira JMXN, de Farias TSMC, Teichrieb V, Kelner J (2009) Massively parallel nearestneighbor queries for dynamic point clouds on the GPU. In: SBAC-PAD. IEEE Computer Society, pp19–25

21. Li X, Han J, Lee JG, Gonzalez H (2007) Traffic density-based discovery of hot routes in road networks.In: SSTD 2007, LNCS, vol 4605, pp 441–459

22. Liao Z-H, Peng W-C (2012) Clustering spatial data with a geographic constraint: exploring local search.Knowl Inf Syst 31(1):153–170

23. Lu E, Lee WC, Tseng V (2010) Mining fastest path from trajectories with multiple destinations in roadnetworks. Knowl Inf Syst 1–29

24. NVIDIA (2011) CUDA programming guide 4.0. Technical report, NVIDIA Corporation25. NVIDIA (2011) NVIDIA CUDA C programming best practices guide 4.0. Technical report, NVIDIA

Corporation26. Ong R, Wachowicz M, Nanni M, Renso C (2010) From pattern discovery to pattern interpretation in

movement data. In: ICDM’10, pp 527–53427. Owens JD, Luebke D, Govindaraju N, Harris M, Krger J, Lefohn AE, Purcell TJ (2007) A survey of

general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–11328. Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2011) Clustering uncertain trajectories.

Knowl Inf Syst 28(1):117–14729. Rinzivillo S, Pedreschi D, Nanni M, Giannotti F, Andrienko NV, Andrienko GL (2008) Visually driven

analysis of movement data by progressive clustering. Inf Vis 7(3–4):225–23930. Sacharidis D, Patroumpas Kl, Terrovitis M, Kantere V, Potamias M, Mouratidis K, Sellis T (2008) On-line

discovery of hot motion paths. In: EDBT’08, March 2008, France, pp 392–40231. Segal M, Akeley K (1994) The design of the openGL graphics interface. Technical report, Silicon Graphics

Computer Systems32. Siddiqi K, Pizer S (2008) Medial representations: mathematics. Algorithms and applications. Springer,

Berlin33. Theodoridis Y (2011) R-Tree portal. http://www.chorochronog.org/. Accessed Apr 201334. Tietbohl A, Bogorny V, Kuijpers B, Alvares LO (2008) A Clustering-based approach for discovering

interesting places in trajectories, In: SAC’08. March 2008, Brazil35. Trasarti R, Pinelli F, Nanni M, Giannotti F (2011) Mining mobility user profiles for car pooling. In:

KDD’011, pp 1190–1198

123

Page 27: Computing and visualizing popular places

Computing and visualizing popular places

36. Wilensky U (1999) NetLogo. http://cc/northwestern.edu/netlogo/. Accessed Apr 2013

Author Biographies

Marta Fort received her B.Sc. degree in Mathematics from the Uni-versitat Politcnica de Barcelona (Spain) and her Ph.D. degree in Soft-ware Engineering from the Universitat de Girona (Spain). Her researcharea is computational geometry and its application in fields such ascomputer graphics, geographic information systems and facility loca-tion. This research is carried out within the scope of the Geometry andGraphics Group (GGG), a research group in computational geometryand computer graphics, of the Universitat of Girona (Spain).

J. Antoni Sellarès received his B.Sc. degree in Mathematics from theUniversitat de Barcelona (Spain) and his Ph.D. degree in Mathemat-ics from the Universitat Politècnica de Catalunya (Spain). His researcharea is computational geometry and its application in fields such ascomputer graphics, geographic information systems, facility locationand geometrical statistics. This research is carried out within the scopeof the Geometry and Graphics Group (GGG), a research group incomputational geometry and computer graphics, of the Universitat ofGirona (Spain).

Nacho Valladares obtained his degree in Computer Science at theUniversitat Pompeu Fabra (Spain) in 2008 and his master’s degree inComputational Geometry and Graphics at the University of Girona(Spain) in 2009. Currently he is working on the Ph.D. in ComputerGraphics and Computational Geometry. His research area is computa-tional geometry and its application in fields such as computer graph-ics, geographic information systems and geometry processing. Thisresearch is carried out within the scope of the Geometry and GraphicsGroup (GGG), a research group in computer graphics.

123