116
Managing Uncertainty in Spatial and Spatio-temporal Data Andreas Züfle 1 , Goce Trajcevski², Tobias Emrich ? Matthias Renz 1 , Hans-Peter Kriegel 1 , Nikos Mamoulis³, Reynold Cheng³ 1 LMU Munich ² NWU Evanston ³ HKU Hong Kong 4 USC Los Angeles

Managing Uncertainty in Spatial and Spatio -temporal Data

Embed Size (px)

DESCRIPTION

Managing Uncertainty in Spatial and Spatio -temporal Data. Andreas Züfle 1 , Goce Trajcevski², Tobias Emrich ? Matthias Renz 1 , Hans-Peter Kriegel 1 , Nikos Mamoulis³, Reynold Cheng³ 1 LMU Munich ² NWU Evanston ³ HKU Hong Kong 4 USC Los Angeles. - PowerPoint PPT Presentation

Citation preview

Page 1: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

Managing Uncertainty in Spatial and Spatio-temporal Data

Andreas Zuumlfle1 Goce Trajcevskisup2 Tobias Emrich

Matthias Renz1 Hans-Peter Kriegel1 Nikos Mamoulissup3 Reynold Chengsup3

1 LMU Munichsup2 NWU Evanstonsup3 HKU Hong Kong4 USC Los Angeles

Managing Uncertainty in Spatial and Spatio-temporal Data

Andreas Zuumlfle1 Goce Trajcevskisup2 Tobias Emrich

Matthias Renz1 Hans-Peter Kriegel1 Nikos Mamoulissup3 Reynold Chengsup3

1 LMU Munichsup2 NWU Evanstonsup3 HKU Hong Kong4 USC Los Angeles

3

4

Aim of this tutorial hellip

rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing

rsaquo A tutorial not a survey

rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and

algorithmsndash BUT in terms of general concepts commonly used in this field

5

Outline

rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce

Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias

Emrich)

rsaquo Please feel free to ask questions at any time during the presentation

rsaquo The latest version of these slides will be made available within the next week

httpwwwdbsifilmude~zuefle

6

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 2: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

Managing Uncertainty in Spatial and Spatio-temporal Data

Andreas Zuumlfle1 Goce Trajcevskisup2 Tobias Emrich

Matthias Renz1 Hans-Peter Kriegel1 Nikos Mamoulissup3 Reynold Chengsup3

1 LMU Munichsup2 NWU Evanstonsup3 HKU Hong Kong4 USC Los Angeles

3

4

Aim of this tutorial hellip

rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing

rsaquo A tutorial not a survey

rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and

algorithmsndash BUT in terms of general concepts commonly used in this field

5

Outline

rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce

Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias

Emrich)

rsaquo Please feel free to ask questions at any time during the presentation

rsaquo The latest version of these slides will be made available within the next week

httpwwwdbsifilmude~zuefle

6

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 3: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

3

4

Aim of this tutorial hellip

rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing

rsaquo A tutorial not a survey

rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and

algorithmsndash BUT in terms of general concepts commonly used in this field

5

Outline

rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce

Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias

Emrich)

rsaquo Please feel free to ask questions at any time during the presentation

rsaquo The latest version of these slides will be made available within the next week

httpwwwdbsifilmude~zuefle

6

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 4: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

4

Aim of this tutorial hellip

rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing

rsaquo A tutorial not a survey

rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and

algorithmsndash BUT in terms of general concepts commonly used in this field

5

Outline

rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce

Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias

Emrich)

rsaquo Please feel free to ask questions at any time during the presentation

rsaquo The latest version of these slides will be made available within the next week

httpwwwdbsifilmude~zuefle

6

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 5: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

5

Outline

rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce

Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias

Emrich)

rsaquo Please feel free to ask questions at any time during the presentation

rsaquo The latest version of these slides will be made available within the next week

httpwwwdbsifilmude~zuefle

6

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 6: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

6

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 7: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

7

Geo-Spatial Data

bull Huge flood of geo-spatial databull Modern technologybull New user mentality

bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost

bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]

[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 8: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

8

Geo-Spatial Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 9: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

9

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 10: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

10

Spatio-Temporal Data

bull (object location time) triples

bull Queries bull ldquoFind friends that

attended the same concert last saturdayrdquo

bull Best case Continuous function

GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 11: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

11

Sources of Uncertainty

bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 12: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

12

Sources of Uncertainty

bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi

positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)

From database perspective the position of a mobile object is uncertain

Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 13: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

13

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 14: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

14

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 15: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

15

Research Challenge

Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process

Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process

Improve the quality of modern location based applications and of research results in the field

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 16: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

16

Uncertain Spatial Data Models

bull Discrete Models

bull Continuous Models

Possible World Semantics

04

b

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 17: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

17

Possible World Semantics

bull A collection of uncertain spatial objects defines an uncertain spatial database

bull Combinations of object instances define possible database instances calledPossible Worlds

bull Assumption The probability of a possible world can be computed efficiently

Possible World Semantics

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 18: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

18

Answering Queries using PWS

Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in

world and zero otherwise

The probability that a query predicate holds on an uncertain database is defined as

Possible World Semantics

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 19: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 20: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

20

Possible Worlds Example II

A B

G

D E FC

H I J K

L

M

N OP

Q

R

U

T

S

V W X Y Z

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 21: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

21

>

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 22: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

22

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 23: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

23

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 24: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

24

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 25: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

25

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 26: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

26

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 27: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

27

Too many possible worlds

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 28: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

28

Querying Uncertain Data Complexity

Naive Query Processing is exponential in the number of objects

Are there efficient solutions to query uncertain spatial data

In general No

ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]

Can be reduced to uncertain spatial databases

But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 29: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

29

q

Querying Uncertain Data Running Example

Return the number of objects located in the depicted circular region centered at query point q

This number is a random variable

Total number of possible worlds

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 30: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

30

Data Cleaning Aggregation

q

C D

H

I

Ignore Uncertainty (Data Cleaning)

Replace uncertain objects by a deterministic ldquobest guessrdquo

Expected Positions

Most-likely Positions

hellip

Query results are not reliable

Query results may be biased

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 31: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

31

Equivalent Worlds An intuition

q

Observation 1

For any possible world and any possible world derived from by changing the position of object the following equivalence holds

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 32: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

32

Equivalent Worlds An intuition

q

Observation 1 allows to discard objects outside ofacute the query region

Remaining number of equivalent classes of possible worlds

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 33: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

33

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

q

C D

H

I

Querying Uncertain Spatial Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 34: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

34

Equivalent Worlds An intuition

Observation 2

For each remaining object we only need to consider the predicate ldquoinsiderdquo

Remaining number of equivalent classes of possible worlds

q

C D

H

I

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 35: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

35

Equivalent Worlds An intuition

Observation 3

We only require the number of objects in the query regionInformation about concrete results objects can be discarded

q

C D

H

I

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 36: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

36

q

08 02

04

H

C A

B

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 37: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

37

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 38: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

38

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 39: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

39

q

08 02

04

H

x x

x

H3

Generating Functions

Main idea Use polyomial multiplication to enumeratepossible results

Observation 3 Anonymize Objects - Substitute ABC by x

Each monomial implies that the probability of having exactly results equals

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 40: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

40

For each object let Consider the following generating function [2]

in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region

Generating Functions Formally

[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 41: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

41

Example

q

08 02

04

H

C A

B

H3

Count Queries on Uncertain Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 42: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

42

q

08 02

04

H

C A

B

H3

Example =

Count Queries on Uncertain Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 43: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

43

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 44: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

44

q

08 02

04

H

C A

B

H3

Example = =

Count Queries on Uncertain Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 45: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

45

The Paradigm of Equivalent Worlds

A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied

I A traditional query on certain data can be answered in polynomial time

II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|

III The probability of a class can be computed in polynomial time

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 46: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

46

The Paradigm of Equivalent Worlds

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 47: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

47

Approximated Results Sampling

Materialize a set S of possible worlds

Samples drawn independent and unbiased

Evaluated the query predicate on each world

Distribution of sampled results is an unbiased approximation of the true distribution of results

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 48: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

48

q

08 02

04

H

C A

B

H3

Sampling Example

Drawing 100 possible worlds may yield the followingestimators

Compare to the exact probabilities

No indication of reliability or confidence of estimations

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 49: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

49

q

08 02

04

H

C A

B

H3

Sampling Confidences

Drawing 100 possible worlds may yield the followingestimators

Use statistical methods to assess the quality of estimators

Eg Wald-Test

Where is the percentile of the standard normal distribution

At a significance level of the true probability is in the interval [0442 0638]

True probability

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 50: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

50

Uncertain Spatial Data Management Summary

Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty

Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results

Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions

Approximations Monte-Carlo sampling Probabilistic guarantees

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 51: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

51

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 52: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

52

bull Modelsbull Motionbull Location Uncertainty

bull Queriesbull Part 1 NN (free-space motion)

bull Semantics and Processing

bull Part 2 Range (road-networks constraints)bull Semantics and Processing

bull Uncertainty ndash the flip-side

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 53: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

53

Model of a trajectory

Y

X

Time

Present time

2d-ROUTE

3d-TRAJECTORY

Approximation does not capture accelerationdeceleration

Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes

Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 54: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

54

Mobility Models

sequence of (locationtime)updates (eg GPS-based)

Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory

(locationtime Velocity Vector )updates

now -gt

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 55: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

55

Modeling UncertaintyLocation Uncertainty Models

Various Sourcesrsquo Imprecision(GPS Sensors)

Quest for data reduction

Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 56: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

56

Modeling Uncertain Trajectories

bull Model 1Cones Beads Necklaces (AKA space-time prisms)

bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples

Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 57: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

57

Modeling Uncertain TrajectoriesModel 2

Constraints

Motion along road network

Uncertainty restricted to Road Segments

p1 p2

q

t2t1

t

Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 58: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

58

Modeling Uncertain TrajectoriesModel 3Sheared Cylinders

Constant bound on location-error at anytime-instant

G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 59: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

59

Processing Spatio-Temporal Queries for Uncertain Trajectories

Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints

Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement

eg possible routes to be taken

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 60: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

6060

Example inside range (disk) around ldquoQrdquo between t1 and t2

1 What is the probability of a path taken

2 What is the earliestlatest arrival at a given vertex (consequently along edge)

Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 61: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

61

Continuous NN for trajectories

Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]

A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]

T = tb

T = te

X

Y

T = t1

Trq Tr1 Tr2Tr3

Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN

Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]

Observation The answer to the query is time-paremeterized

Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 62: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

62

Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant

UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]

T = te

X

YT = tb

T = t1

Trq Tr1 Tr2Tr3

T = tb1

T = t11

Observations

adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability

= what should the structure ofthe answer to UQ_nn(q) look like

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 63: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

63

Structure of the Answer to Probabilistic NN-query for Trajectories

We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree

Tri11 tb t11 D11

Tri12l t12(l-1) t12 D12k12

[TrQ tb te]

Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm

Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm

Tri121 t11 t121 D121

Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]

Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 64: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

64

Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)

Rmin gt Rmax

Rmin

Rmax

Q

Tr1

Tr2

Tr3

Tr4

Tr5

Rd1

1

14

Trq = 2D point Q The rest are possible-locations bounded by circles

Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN

The probability that the location of Tri is within distance Rd from Q is given by

The probability that a given object Trj is the NN of Q is given by

ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 65: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

65

When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example

Rmin

Rmax

TrQ

Tr1

Tr2

Tr3

Tr4

Tr5

Z1

Z2

dist(Z1Z2) lt Rmax

We can no longer ldquoprunerdquo Tr4 from consideration

The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq

NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 66: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

66

1

x

y

pdf

0

2r

1r2

pdf(Tr2)

pdf(Trq) pdf(Tr1)

Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq

Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for

DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 67: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

67

1

x

y

pdf

0-40

+40

4r

34r2

pdf(Tr2 - Trq)

pdf(Tr1 - Trq)

Let Viq = Vi + (-Vq)

Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)

THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)

Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 68: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

68

The continuous aspect of the probabilistic NN queries

Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq

Then the distance of the expected location along TRiq from the origin at a function of t is

hyperbola since A gt 0

Now we have a collection of such distance functions (for each i)

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 69: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

69

The continuous aspect of the probabilistic NN queries

Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)

Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points

dist

TR1

TR2

LE12

tb t11 t12

Example

Lower envelope of two distance-trajectories

critical time-point

NOTE time-dimension on horizontal axis

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 70: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

70

The continuous aspect of the probabilistic NN queries

Q how to efficiently construct the lower envelope of the whole collection of distance-functions

A divide-and-conquer approach in the spirit of Merge-Sortdist

TR1

TR2

LE12

tb t11 t12

dist

TR3

TR4

LE34

tb tet31dist

TR1

TR2

LE1234

tb tet11 t12

TR3

TR4

t31

t1new t2new

Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)

Now how about the IPAC-NN tree

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 71: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

71

The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)

Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree

dist dist

lower envelope

TR1TR2

TR3

TR4

TR5

TR6

tb te

2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )

4r

TR7

t1 t2 t3

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 72: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

72

Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie

Example if pdf of selecting a possible path is uniform

Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 73: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

73

Combine the two probabilitieshellip

Possible pathsPossible locations when t=4

a

m

Probability of falling inside segment mp1

0505 = 025 (at t = 4)

Probability of an object a falling inside some segment S at given time instant t

)(

))(Pr()Pr())(Pr(tPPp

p

a

tPLSpSta eg uniform pdf

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 74: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

74

ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory

representation we needndash The collection of all the possible paths (and probabilities)

ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi

to position pi+1

At the time-instant t = 4 the object can

be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 75: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

75

Processing

rsaquo Data Structuresndash Edge hash-table

rsaquo Small in-memoryndash Movement R-tree

rsaquo 1D for each edgendash Trajectories List

rsaquo Store samples ordered by time-stamp

rsaquo Assign pointer to set of possible paths

rsaquo Earliest-arriving and Latest-departure stored for vertices

rsaquo On-disk retrieve on-demand

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 76: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

76

rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges

whose network distances with q are smaller than r

rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect

with tq

ndash The objects pointed by those leaf entries are candidates

rsaquo Refinementndash The candidate set is considerably smaller than original

trajectory datasetndash Calculate the qualification probability for each candidate

AB

C

D E

q

earliestlatest times of arrival possible path but definitely not part of the

answer to the range query

expansion tree ldquobubblingrdquoalong road-network segments

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 77: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

77

Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the

earliest arriving time and latest departure time at each vertex along the possible path

ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants

ndash The QPs critical time instants define an envelop function for the actual QPs

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 78: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

78

Uncertainty ndash the flip-side

rsaquo Data reductionndash Why transmitting every single location point

rsaquo Bandwidthrsaquo Energy

rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size

John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 79: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

79

Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction

ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)

rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 80: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

80

Outline

rsaquo Introduction

rsaquo Uncertain Spatial Data

rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)

rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 81: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

81

So far hellip

Stochastic methods for uncertain spatial data

Probabilisitc results

Time dimension

Geometric models for uncertain

spatio-temporal data

Probabilisitc results

Time dimension

vs

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 82: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

82

Merge the approaches

rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is

given

rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series

rsaquo Problem Independency assumption prohibits time-parametrized queries

R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 83: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

83

A highway examplehellip

t

pos

Q

What is the probability that the car is in some area at least once during some time

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 84: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

84

A highway examplehellip

t

pos

Q

vmin vmax

What is the probability that the car is in some area at least once during some time

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 85: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

85

A highway examplehellip

t

pos

vmin vmax

Q

Assuming uniform distributionhellip

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 86: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

86

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 87: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

87

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065

Violation of the speed constraints

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 88: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

88

A highway examplehellip

t

pos

50

30

Q

Independence assumption 1 ndash (1-05)(1-03) =

065Consideration of dependency

= 05

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 89: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

89

An adequate model

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 90: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

90

Stochastic Processesrsaquo Stochastic Processes are used to represent the

evolution of some random value or system over time

rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time

rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)

Doob J L (1953) Stochastic Processes Wiley

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 91: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

91

A simple example

rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities

rsaquo At the border of the wood board these probabilities are different

rsaquo This model is usually learned or given by experts

04 0402

06 04

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 92: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

92

04 0402

016 016036 016016

012008 012 012 012 012 012 012 008

A simple example

rsaquo Initial Position

rsaquo After first hit

rsaquo After second hit

rsaquo After 40th hit

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 93: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

93

How can we model this

rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)

rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0

04 02 04 0 0 0 0 0 0

0 04 02 04 0 0 0 0 0

0 0 04 02 04 0 0 0 0

0 0 0 04 02 04 0 0 0

0 0 0 0 04 02 04 0 0

0 0 0 0 0 04 02 04 0

0 0 0 0 0 0 04 02 04

0 0 0 0 0 0 0 06 04

from

bu

cket

to bucket

= M

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 94: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

94

How can we model this

rsaquo First hit

rsaquo Second hit

rsaquo 40th hit

0 0 1 0 0 0 0 0 0( ) 002

04

02 0 0 0 0 0

)

( ) M40 = (

08 012 012 012 012 012 012 012 008 )

04

06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04

= (

0 0 1 0 0 0 0 0 0

( ) M = (

016 016 036 016 016 0 0 0 0 )0

04

02

04 0 0 0 0 0

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 95: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

95

Fusion of Model and Reality

rsaquo Discretization of time and spacendash Eg treat intersections as

states and add additional stateson long streets

ndash The time interval correspondingto a tick is eg 20 sec

rsaquo Estimation of model parametersndash Transition probabilities from one state to another are

learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different

object groups

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 96: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

96

Querying the Model

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 97: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

97

ST - Window Queries

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

119872=( 0 0 106 0 040 08 02)

s1

s2

s3

10

06

06

02

04

Note We have an exponential number ofpossible paths the car might take

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 98: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

98

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 99: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

99

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 100: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

100

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 101: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

101

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 048016036)

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 102: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

102

ST - Window Queries

(100)119872=( 0 0 1

06 0 040 08 02)

t=0 t=1 t=2 t=3

rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]

s1

s2

s3

s1

s2

s3

10

06

06

02

04

(001)

( 00802)

( 00016004 )Result = 096

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 103: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices

103

ST - Window Queries

119872minus=(0 0 1 0

06 0 04 00 08 02 00 0 0 1

)

(1000)(

0010)(

00

0208

)(00

004096

)119872

+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1

)iquest

s1

s2

s3

s4

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 104: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

104

Multiple Observations

rsaquo So far we had only one observationfrom which we could extrapolate

rsaquo This is not really of interest sincecars do not move randomly

rsaquo With two observations we have tointroduce more artificial states andadapt the techniques

loca

tion

spac

e

time spacet0

loca

tion

spac

e

time spacet0 t1

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 105: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

105

Multiple Observations

rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si

- corresponding to worlds where o is located in state si and o has not intersected the window

ndash One class Si+ corresponding to worlds where o is located in

state si and o has not intersected the window

119872=( 0 0 106 0 040 08 02)

t=0 t=1 t=2 t=3

s1

s2

s3

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 106: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

106

Multiple Observations

(100000)

t=0 t=1 t=2 t=3

s1

s2

s3

(001000)(

00

020

080

)(0

0 160 04048

0032

)S1

-

S2-

S3-

S1+

S2+

S3+

not∎

119872minus=(119872 00119872 )

119872

+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802

)iquest

119872=( 0 0 106 0 040 08 02)

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 107: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

107

iquest119875 (∎and)119875 ()

119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()

iquest119875 (∎and)

119875 (and∎ )+119875 (andnot∎)

Bayesrsquo Theorem

(0

0 160 04048

0032

) iquest032

032+004=089

rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3

S1-

S2-

S3-

S1+

S2+

S3+

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 108: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

108

Summary

rsaquo Prosndash Allows to answer time-parametrized queries according to

possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse

matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more

validation needed)

rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect

modelling

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 109: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

109

Selected Works

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 110: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

110

Indexing UST Data

rsaquo With the above techniques each object in the database has to be processed

rsaquo Index Structure based on R-Tree indexing the ST-Space

rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)

Poid

time

loca

tion

T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 111: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

111

KNN queries + Sampling on UST Data

rsaquo Not all queries can be solved as elegant as window queries

rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of

samplesndash Approximate result probability = ratio

of samples that satisfy the query and total number of drawn samples

rsaquo But how to draw samples efficiently such that they are conform with the observations

rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 112: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

112

rsaquo Application RFID sensors track individuals in an indoor environment

rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office

rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions

Event queries

Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 113: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

113

Summary

rsaquo Models for spatial uncertainty

rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds

rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)

rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 114: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

114

Thanks for listening

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 115: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

115

Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley

rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012

rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012

rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)

rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)
Page 116: Managing Uncertainty in  Spatial  and  Spatio -temporal Data

116

Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias

Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49

rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181

rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014

rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127

rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443

rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728

  • Managing Uncertainty in Spatial and Spatio-temporal Data
  • Managing Uncertainty in Spatial and Spatio-temporal Data (2)
  • Slide 3
  • Aim of this tutorial hellip
  • Outline
  • Outline (2)
  • Geo-Spatial Data
  • Geo-Spatial Data (2)
  • Slide 9
  • Spatio-Temporal Data
  • Sources of Uncertainty
  • Sources of Uncertainty (2)
  • Research Challenge
  • Research Challenge (2)
  • Research Challenge (3)
  • Uncertain Spatial Data Models
  • Possible World Semantics
  • Answering Queries using PWS
  • Possible Worlds Example II
  • Possible Worlds Example II (2)
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Querying Uncertain Data Complexity
  • Querying Uncertain Data Running Example
  • Data Cleaning Aggregation
  • Equivalent Worlds An intuition
  • Equivalent Worlds An intuition (2)
  • Equivalent Worlds An intuition (3)
  • Equivalent Worlds An intuition (4)
  • Equivalent Worlds An intuition (5)
  • Generating Functions
  • Generating Functions (2)
  • Generating Functions (3)
  • Generating Functions (4)
  • Generating Functions Formally
  • Count Queries on Uncertain Data
  • Count Queries on Uncertain Data (2)
  • Count Queries on Uncertain Data (3)
  • Count Queries on Uncertain Data (4)
  • The Paradigm of Equivalent Worlds
  • The Paradigm of Equivalent Worlds (2)
  • Approximated Results Sampling
  • Sampling Example
  • Sampling Confidences
  • Uncertain Spatial Data Management Summary
  • Outline (3)
  • Slide 52
  • Model of a trajectory
  • Mobility Models
  • Modeling Uncertainty
  • Modeling Uncertain Trajectories
  • Modeling Uncertain Trajectories (2)
  • Modeling Uncertain Trajectories (3)
  • Processing Spatio-Temporal Queries for Uncertain Trajectories
  • Slide 60
  • Continuous NN for trajectories
  • Impact of Uncertainty
  • Structure of the Answer to Probabilistic NN-query for Trajector
  • Instantaneous (ie spatial) NN-query when the querying object
  • Slide 65
  • Slide 66
  • Slide 67
  • The continuous aspect of the probabilistic NN queries
  • The continuous aspect of the probabilistic NN queries (2)
  • The continuous aspect of the probabilistic NN queries (3)
  • Slide 71
  • Slide 72
  • Combine the two probabilitieshellip
  • ConstructionRepresentation
  • Processing
  • Slide 76
  • Temporal-continuous query
  • Uncertainty ndash the flip-side
  • Uncertainty ndash the other-flip-side
  • Outline (4)
  • So far hellip
  • Merge the approaches
  • A highway examplehellip
  • A highway examplehellip (2)
  • A highway examplehellip (3)
  • A highway examplehellip (4)
  • A highway examplehellip (5)
  • A highway examplehellip (6)
  • Slide 89
  • Stochastic Processes
  • A simple example
  • A simple example (2)
  • How can we model this
  • How can we model this (2)
  • Fusion of Model and Reality
  • Slide 96
  • ST - Window Queries
  • ST - Window Queries (2)
  • ST - Window Queries (3)
  • ST - Window Queries (4)
  • ST - Window Queries (5)
  • ST - Window Queries (6)
  • ST - Window Queries (7)
  • Multiple Observations
  • Multiple Observations (2)
  • Multiple Observations (3)
  • Bayesrsquo Theorem
  • Summary
  • Slide 109
  • Indexing UST Data
  • KNN queries + Sampling on UST Data
  • Event queries
  • Summary
  • Slide 114
  • Related Work
  • Related Work (2)